Remove old versions of system calls and system calls that don't have
a libc api interface anymore (dup, dup2, creat).
VFS still contains support for old system call numbers for the new stat
system calls (i.e., 65, 66, 67) to keep supporting old binaries built for
MINIX 3.2.1 (prior to the release).
Change-Id: I721779b58a50c7eeae20669de24658d55d69b25b
Make the frclock functions similar to the tsc utility functions. This
way, we can call frclock functions from the framebuffer driver which
will use frclock on ARM and tsc on X86.
Also, frclock_64_to_micros computed seconds, not microseconds
Change-Id: I6718ae0fb7db050794f6f032205923e1a32dc1ac
This patch introduces a framebuffer to Minix. It's written for the ARM
port of Minix, but has an architectural split that separates the
hardware dependent part from the non-hardware dependent part. Futhermore,
this driver was developed using a screen that has a native resolution of
1024x600 pixels and having lack of support for obtaining EDID from the
screen. Consequently, it uses a hardcoded resolution of 1024x600.
The driver uses an interface based on the Linux ioctl API, but supports
only a very limited subset.
. the total amount of memory in the system didn't include the memory
used by the boot-time modules and some dynamic allocation by the
kernel at boot time (to map in VM). especially apparent on our
ARM board with 'only' 512MB of memory and a huge ramdisk.
. also: *add* the VM loaded module to the freelist after it has
been allocated for & mapped in instead of cutting it *out* of the
freelist! so we get a few more MB free..
Change-Id: If37ac32b21c9d38610830e21421264da4f20bc4f
. allow any number of pde's used for pagedir mapping
. allows >1024 NR_PROCS on x86, >64 on ARM
. allows NR_PROCS to be the same in both cases
. also cleanup: allocating spare PDE's is not necessary
throw that function out
Change-Id: Ibb8f8cf6e7db6a4d6384b6911d1a3f3f5e5d8256
* Generalize GPIO handling.
* Add libs to configure gpio's clocks and pads
* Add Interrupt handling.
* Introduce mmio.h and log.h
Change-Id: I928e4c807d15031de2eede4b3ecff62df795f8ac
The Cycle CouNTer on ARM cannot be used reliably as it wraps around
rather quickly and can be altered by user space (on Minix). Furthermore,
it's buggy when wrapping and is not implemented at all on the Linaro
Beagleboard emulator.
This patch programs GPTIMER10 as a free running clock at 1.625 MHz (it
doesn't generate interrupts). It's memory mapped into every process,
which enables libsys to provide micro_delay().
Change-Id: Iba004c6c62976762fe154ea390d69e518eec1531
Due to the ABI we are using we have to use the earm architecture
moniker for the build system to behave correctly. This involves
then some headers to move around.
There is also a few related Makefile updates as well as minor
source code corrections.
This makes sure the types are coherent, and right now, time_t is
defined as an long, through _BSD_TIME_T_. It previously was
hardcoded as an int, so the structure's size does not change.
Change-Id: If29e94ab53f605d1480fadb540f5b67be4ddaf5b
* Updating common/lib
* Updating lib/csu
* Updating lib/libc
* Updating libexec/ld.elf_so
* Corrected test on __minix in featuretest to actually follow the
meaning of the comment.
* Cleaned up _REENTRANT-related defintions.
* Disabled -D_REENTRANT for libfetch
* Removing some unneeded __NBSD_LIBC defines and tests
Change-Id: Ic1394baef74d11b9f86b312f5ff4bbc3cbf72ce2
. restore state depends on how saving of state was done;
also remember trap style in sig context
. actually set and restore TRACEBIT with new trap styles;
have to remove it once process enters kernel though, done
in debug trap exception handler
. introduce MF_STEP that makes arch-specific code
turn on trace bit instead of setting TRACEBIT directly,
a bit more arch-friendly and avoids keeping precious
state in per-process PSW arch-dependently
- CHOOSETRAP define makes impossible to use some common words
like send, receive and notify in any other context, for
instance as members or structures
- any reasonable compiler inlines the static inline functions so
no extra function call overhead is introduced by this change
- this gets us back to the situation before the SYSCALL/SYSENTER
change. It is not perfect, but it used to work and still does.
upgrade to NetBSD CVS release from 2012/10/17 12:00:00 UTC
Makefiles updates to imporve portability
Made sure to be consistent in the usage of braces/parenthesis at
least on a per file basis. For variables, it is recommended to
continue to use braces.
The tested targets are the followgin ones:
* tools
* distribution
* sets
* release
The remaining NetBSD targets have not been disabled nor tested
*at all*. Try them at your own risk, they may reboot the earth.
For all compliant Makefiles, objects and generated files are put in
MAKEOBJDIR, which means you can now keep objects between two branch
switching. Same for DESTDIR, please refer to build.sh options.
Regarding new or modifications of Makefiles a few things:
* Read share/mk/bsd.README
* If you add a subdirectory, add a Makefile in it, and have it called
by the parent through the SUBDIR variable.
* Do not add arbitrary inclusion which crosses to another branch of
the hierarchy; If you can't do without it, put a comment on why.
If possible, do not use inclusion at all.
* Use as much as possible the infrastructure, it is here to make
life easier, do not fight it.
Sets and package are now used to track files.
We have one set called "minix", composed of one package called "minix-sys"
Bumping libc files for unsupported architectures, to simplify merging.
A bunch of small fixes:
* in libutil update
* the macro in endian.h
* some undefined types due to clear separation from host.
* Fix a warning for cdbr.c
Some modification which were required for the new build system:
* inclusion path for const.h in sconst, still hacky
* Removed default malloc.c which conflicts on some occasions.
. Check if we have the right number of boot modules
. Check if the ELF parsing of VM actually succeeded
Both these are root causes of less-than-obvious other
errors/asserts a little further down the line; uncovered
while experimenting with booting by iPXE, specifically
(a) iPXE having a 8-multiboot-modules limit and
(b) trying to boot a gzipped VM.
Add primary cache management feature to libminixfs as mfs and ext2
currently do separately, remove cache code from mfs and ext2, and make
them use the libminixfs interface. This makes all fields of the buf
struct private to libminixfs and FS clients aren't supposed to access
them at all. Only the opaque 'void *data' field (the FS block contents,
used to be called bp) is to be accessed by the FS client.
The main purpose is to implement the interface to the 2ndary vm cache
just once, get rid of some code duplication, and add a little
abstraction to reduce the code inertia of the whole caching business.
Some minor sanity checking and prohibition done by mfs in this code
as removed from the generic primary cache code as a result:
- checking all inodes are not in use when allocating/resizing
the cache
- checking readonly filesystems aren't written to
- checking the superblock isn't written to on mounted filesystems
The minixfslib code relies on fs_blockstats() in the client filesystem to
return some FS usage information.
Introduce explicit abstractions for different mapping types,
handling the instantiation, forking, pagefaults and freeing of
anonymous memory, direct physical mappings, shared memory and
physically contiguous anonymous memory as separate types, making
region.c more generic.
Also some other genericification like merging the 3 munmap cases
into one.
COW and SMAP safemap code is still implicit in region.c.
. add cpufeature detection of both
. use it for both ipc and kernelcall traps, using a register
for call number
. SYSENTER/SYSCALL does not save any context, therefore userland
has to save it
. to accomodate multiple kernel entry/exit types, the entry
type is recorded in the process struct. hitherto all types
were interrupt (soft int, exception, hard int); now SYSENTER/SYSCALL
is new, with the difference that context is not fully restored
from proc struct when running the process again. this can't be
done as some information is missing.
. complication: cases in which the kernel has to fully change
process context (i.e. sigreturn). in that case the exit type
is changed from SYSENTER/SYSEXIT to soft-int (i.e. iret) and
context is fully restored from the proc struct. this does mean
the PC and SP must change, as the sysenter/sysexit userland code
will otherwise try to restore its own context. this is true in the
sigreturn case.
. override all usage by setting libc_ipc=1
complete munmap implementation; single-page references made
a general munmap() implementation possible to write cleanly.
. memory: let the MIOCRAMSIZE ioctl set the imgrd device
size (but only to 0)
. let the ramdisk command set sizes to 0
. use this command to set /dev/imgrd to 0 after mounting /usr
in /etc/rc, so the boot time ramdisk is freed (about 4MB
currently)
. map all objects named usermapped_*.o with globally visible
pages; usermapped_glo_*.o with the VM 'global' bit on, i.e.
permanently in tlb (very scarce resource!)
. added kinfo, machine, kmessages and loadinfo for a start
. modified log, tty to make use of the shared messages struct
. some strncpy/strcpy to strlcpy conversions
. new <minix/param.h> to avoid including other minix headers
that have colliding definitions with library and commands code,
causing parse warnings
. removed some dead code / assignments
adjust the smp booting procedure for segmentless operation. changes are
mostly due to gdt/idt being dependent on paging, because of the high
location, and paging being on much sooner because of that too.
also smaller fixes: redefine DESC_SIZE, fix kernel makefile variable name
(crosscompiling), some null pointer checks that trap now because of a
sparser pagetable, acpi sanity checking
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.
There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.
No static pre-allocated memory sizes exist any more.
Changes to booting:
. The pre_init.c leaves the kernel and modules exactly as
they were left by the bootloader in physical memory
. The kernel starts running using physical addressing,
loaded at a fixed location given in its linker script by the
bootloader. All code and data in this phase are linked to
this fixed low location.
. It makes a bootstrap pagetable to map itself to a
fixed high location (also in linker script) and jumps to
the high address. All code and data then use this high addressing.
. All code/data symbols linked at the low addresses is prefixed by
an objcopy step with __k_unpaged_*, so that that code cannot
reference highly-linked symbols (which aren't valid yet) or vice
versa (symbols that aren't valid any more).
. The two addressing modes are separated in the linker script by
collecting the unpaged_*.o objects and linking them with low
addresses, and linking the rest high. Some objects are linked
twice, once low and once high.
. The bootstrap phase passes a lot of information (e.g. free memory
list, physical location of the modules, etc.) using the kinfo
struct.
. After this bootstrap the low-linked part is freed.
. The kernel maps in VM into the bootstrap page table so that VM can
begin executing. Its first job is to make page tables for all other
boot processes. So VM runs before RS, and RS gets a fully dynamic,
VM-managed address space. VM gets its privilege info from RS as usual
but that happens after RS starts running.
. Both the kernel loading VM and VM organizing boot processes happen
using the libexec logic. This removes the last reason for VM to
still know much about exec() and vm/exec.c is gone.
Further Implementation:
. All segments are based at 0 and have a 4 GB limit.
. The kernel is mapped in at the top of the virtual address
space so as not to constrain the user processes.
. Processes do not use segments from the LDT at all; there are
no segments in the LDT any more, so no LLDT is needed.
. The Minix segments T/D/S are gone and so none of the
user-space or in-kernel copy functions use them. The copy
functions use a process endpoint of NONE to realize it's
a physical address, virtual otherwise.
. The umap call only makes sense to translate a virtual address
to a physical address now.
. Segments-related calls like newmap and alloc_segments are gone.
. All segments-related translation in VM is gone (vir2map etc).
. Initialization in VM is simpler as no moving around is necessary.
. VM and all other boot processes can be linked wherever they wish
and will be mapped in at the right location by the kernel and VM
respectively.
Other changes:
. The multiboot code is less special: it does not use mb_print
for its diagnostics any more but uses printf() as normal, saving
the output into the diagnostics buffer, only printing to the
screen using the direct print functions if a panic() occurs.
. The multiboot code uses the flexible 'free memory map list'
style to receive the list of free memory if available.
. The kernel determines the memory layout of the processes to
a degree: it tells VM where the kernel starts and ends and
where the kernel wants the top of the process to be. VM then
uses this entire range, i.e. the stack is right at the top,
and mmap()ped bits of memory are placed below that downwards,
and the break grows upwards.
Other Consequences:
. Every process gets its own page table as address spaces
can't be separated any more by segments.
. As all segments are 0-based, there is no distinction between
virtual and linear addresses, nor between userspace and
kernel addresses.
. Less work is done when context switching, leading to a net
performance increase. (8% faster on my machine for 'make servers'.)
. The layout and configuration of the GDT makes sysenter and syscall
possible.
. sys_vircopy always uses D for both src and dst
. sys_physcopy uses PHYS_SEG if and only if corresponding
endpoint is NONE, so we can derive the mode (PHYS_SEG or D)
from the endpoint arg in the kernel, dropping the seg args
. fields in msg still filled in for backwards compatability,
using same NONE-logic in the library
. all invocations were S or D, so can safely be dropped
to prepare for the segmentless world
. still assign D to the SCP_SEG field in the message
to make previous kernels usable
. new mode for sys_memset: include process so memset can be
done in physical or virtual address space.
. add a mode to mmap() that lets a process allocate uninitialized
memory.
. this allows an exec()er (RS, VFS, etc.) to request uninitialized
memory from VM and selectively clear the ranges that don't come
from a file, leaving no uninitialized memory left for the process
to see.
. use callbacks for clearing the process, clearing memory in the
process, and copying into the process; so that the libexec code
can be used from rs, vfs, and in the future, kernel (to load vm)
and vm (to load boot-time processes)
. make exec() callers (i.e. vfs and rs) determine the
memory layout by explicitly reserving regions using
mmap() calls on behalf of the exec()ing process,
i.e. handling all of the exec logic, thereby eliminating
all special exec() knowledge from VM.
. the new procedure is: clear the exec()ing process
first, then call third-party mmap()s to reserve memory, then
copy the executable file section contents in, all using callbacks
tailored to the caller's way of starting an executable
. i.e. no more explicit EXEC_NEWMEM-style calls in PM or VM
as with rigid 2-section arguments
. this naturally allows generalizing exec() by simply loading
all ELF sections
. drop/merge of lots of duplicate exec() code into libexec
. not copying the code sections to vfs and into the executable
again is a measurable performance improvement (about 3.3% faster
for 'make' in src/servers/)
these two functions will be used to support all exec() functionality
going into a single library shared by RS and VFS and exec() knowledge
leaving VM.
. third-party mmap: allow certain processes (VFS, RS) to
do mmap() on behalf of another process
. PROCCTL: used to free and clear a process' address space
. readbios call is now a physical copy with range check in
the kernel call instead of BIOS_SEG+umap_bios
. requires all access to physical memory in bios range to go
through sys_readbios
. drivers/dpeth: wasn't using it
. adjusted printer
According to POSIX the st_size field of struct stat is undefined for
fifos and anonymous pipes. Thus we can do anything we want. We save a
copy by not being accurate on pipe sizes.
. vfs: pass execname in aux vectors
. ld.elf_so: use this to expand $ORIGIN
. this requires the executable to reserve more
space at exec() calling time
. generalize libexec slightly to get some more necessary information
from ELF files, e.g. the interpreter
. execute dynamically linked executables when exec()ed by VFS
. switch to netbsd variant of elf32.h exclusively, solves some
conflicting headers
. file- and functionality-compatible with previous situation
(FreeBSD csu) (with a crt1.o -> crt0.o symlink in /usr/lib)
. harmonizes source with netbsd
. harmonizes linker invocation (e.g. clang) with netbsd
. helpful to get some arm code in there for the arm port project
This Shared Folders File System library (libsffs) now contains all the
file system logic originally in HGFS. The actual HGFS server code is
now a stub that passes on all the work to libsffs. The libhgfs library
is changed accordingly.
. common/include/arch/i386 is not actually an imported
sys/arch/i386/include but leftover Minix files;
remove and move to include/
. move include/ufs to sys/ufs, where it came from, now that
we have a sys/ hierarchy
. move mdocml/ to external/bsd/, now we have that
. single sys/arch/i386/stand/ import for boot stuff
- libnetsock - internal implementation of a socket on the lwip
server side. it encapsulates the asynchronous protocol
- lwip server - uses libnetsock to work with the asynchronous
protocol
- if an operation (R, W, IOCTL) is non blocking, a flag is set
and sent to the device.
- nothing changes for sync devices
- asyn devices should reply asap if an operation is non-blocking.
We must trust the devices, but we had to trust them anyway to
reply to CANCEL correctly
- we safe sending CANCEL commands to asyn devices. This greatly
simplifies the protocol. Asynchronous devices can always reply
when a reply is ready and do not need to deal with other
situations
- currently, none of our drivers use the flags since they drive
virtual devices which do not block
There is important information about booting non-ack images in
docs/UPDATING. ack/aout-format images can't be built any more, and
booting clang/ELF-format ones is a little different. Updating to the
new boot monitor is recommended.
Changes in this commit:
. drop boot monitor -> allowing dropping ack support
. facility to copy ELF boot files to /boot so that old boot monitor
can still boot fairly easily, see UPDATING
. no more ack-format libraries -> single-case libraries
. some cleanup of OBJECT_FMT, COMPILER_TYPE, etc cases
. drop several ack toolchain commands, but not all support
commands (e.g. aal is gone but acksize is not yet).
. a few libc files moved to netbsd libc dir
. new /bin/date as minix date used code in libc/
. test compile fix
. harmonize includes
. /usr/lib is no longer special: without ack, /usr/lib plays no
kind of special bootstrapping role any more and bootstrapping
is done exclusively through packages, so releases depend even
less on the state of the machine making them now.
. rename nbsd_lib* to lib*
. reduce mtree
. rc script and service know to look in /usr/pkg/.. for
extra binaries and conf files
. service split into parsing config and doing RS request
so that a new utility (printconfig) can just print the
config in machine-parseable format for netconf integration
. converted all base system eth drivers/netconf
Import libpuffs and our port of libpuffs. The port was done as part of
GSoC 2011 FUSE project, done by Evgeniy Ivanov. The librefuse import
did not require any porting efforts. Libpuffs has been modified to
understand our VFS-FS protocol and translate between that and PUFFS. As
an example that it works, fuse-ntfs-3g from pkgsrc can be compiled and
used to mount ntfs partitions:
mount -t ntfs-3g <device> <mountpoint>
FUSE only works with the asynchronous version of VFS. See <docs/UPDATING> on
how to run AVFS.
This patch further includes some changes to mount(1) and mount(2) so it's
possible to use file systems provided by pkgsrc (note: manual modifications
to /etc/system.conf are still needed. There has been made an exception for
fuse-ntfs-3g, so it already as an entry).
This patch fixes most of current reasons to generate compiler warnings.
The changes consist of:
- adding missing casts
- hiding or unhiding function declarations
- including headers where missing
- add __UNCONST when assigning a const char * to a char *
- adding missing return statements
- changing some types from unsigned to signed, as the code seems to want
signed ints
- converting old-style function definitions to current style (i.e.,
void func(param1, param2) short param1, param2; {...} to
void func (short param1, short param2) {...})
- making the compiler silent about signed vs unsigned comparisons. We
have too many of those in the new libc to fix.
A number of bugs in the test set were fixed. These bugs were never
triggered with our old libc. Consequently, these tests are now forced to
link with the new libc or they will generate errors (in particular tests 43
and 55).
Most changes in NetBSD libc are limited to moving aroudn "#ifndef __minix"
or stuff related to Minix-specific things (code in sys-minix or gen/minix).
. move cache size heuristic from mfs there
so mfs and ext2 can share it
. add vfs credentials retrieving function, with
backwards compatability from previous struct
format, to be used by both ext2 and mfs
. fix for ext2 - STATICINIT was fed no.
of bytes instead of no. of elements, overallocating
memory by a megabyte or two for the superblock
. move mfs-specific struct, constants to mfs/, so
mfs-specific, on-disk format structs and consts are
fully isolated from generic structs and functions
. removes de and readfs utils
. it's a good extra interface to have but doesn't
meet standardised functionality
. applications (in pkgsrc) find it and expect
full functionality the minix mmap doesn't offter
. on the whole probably better to hide these functions
(mmap and friends) until they are grown up; the base system
can use the new minix_* names
. MAP_SHARED was used to implement sysv shared memory
. used to signal shareable memory region to VM
. assumptions about this situation break when processes
use MAP_SHARED for its normal, standardised meaning
* VFS and installed MFSes must be in sync before and after this change *
Use struct stat from NetBSD. It requires adding new STAT, FSTAT and LSTAT
syscalls. Libc modification is both backward and forward compatible.
Also new struct stat uses modern field sizes to avoid ABI
incompatibility, when we update uid_t, gid_t and company.
Exceptions are ino_t and off_t in old libc (though paddings added).
1. ack, a.out, minix headers (moved to /usr/include.ack),
minix libc
2. gcc/clang, elf, netbsd headers (moved to /usr/include),
netbsd libc (moved to /usr/lib)
So this obsoletes the /usr/netbsd hierarchy.
No special invocation for netbsd libc necessary - it's always used
for gcc/clang.
. remove a few asserts in the kernel and 64bi library
that are not compatible with the timing code
. change the TIME_BLOCKS code a little to work in-kernel
This patch moves more includes (most of them, to tell the truth) to
common/include directory. This completes the list of includes needed
to compile current trunk with the new libc (but to do that you need
more patches in queue).
This patch also contains some modification (for compilation with new
headers) to the common includes under __NBSD_LIBC, the define used
in mk script to specialize compilation with new includes.
This patch moves further includes (the network part and lib.h) in common/.
It is the last part to get the netbsd libc to compile under minix. Further moves will be needed as we get the netbsd libc to compile minix itself.
Also, this patch add #ifndef's to termios.h, as it create problems with netbsd's namespace.h.
Headers that will be shared between old includes and NetBSD-like includes
are moved into common/include tree. They are still copied in /usr/include
in 'make includes', so compilation and programs aren't be affected.
M include/Makefile
A include/minix/input.h
M include/minix/com.h
M drivers/tty/keyboard.c
M drivers/tty/tty.c
M drivers/tty/tty.h
M include/minix/syslib.h
M lib/libsys/Makefile
A lib/libsys/input.c
- kernel maintains a cpu_info array which contains various
information about each cpu as filled when each cpu boots
- the information contains idetification, features etc.
- every pci device which implements _PRT acpi method is considered to
be a pci-to-pci bridge
- acpi driver constructs a hierarchy of pci-to-pci bridges
- when pci driver identifies a pci-to-pci bridge it tells acpi driver
what is the primary and the secondary bus for this device
- when pci requests IRQ routing information from acpi, it passes the
bus number too to be able to identify the device accurately
With this change, suggested by Gautam Tirumala, ports for pkgin and
pkg_install are cleaner and so easier to upstream. Presumably other
ports will be smoother too.
There doesn't seem to be a reason SSIZE_MAX was so small to begin with.
Before, the 'main thread' of a process was never taken into account anywhere in
the library, causing mutexes not to work properly (and consequently, neither
did the condition variables). For example, if the 'main thread' (that is, the
thread which is started at the beginning of a process; not a spawned thread by
the library) would lock a mutex, it wasn't actually locked.
- sometimes the system needs to know precisely on what type of cpu is
running. The cpu type id detected during arch specific
initialization and kept in the machine structure for later use.
- as a side-effect the information is exported to userland
- profile --nmi | --rtc sets the profiling mode
- --rtc is default, uses BIOS RTC, cannot profile kernel the presetted
frequency values apply
- --nmi is only available in APIC mode as it uses the NMI watchdog, -f
allows any frequency in Hz
- both modes use compatible data structures
- when kernel profiles a process for the first time it saves an entry
describing the process [endpoint|name]
- every profile sample is only [endpoint|pc]
- profile utility creates a table of endpoint <-> name relations and
translates endpoints of samples into names and writing out the
results to comply with the processing tools
- "task" endpoints like KERNEL are negative thus we must cast it to
unsigned when hashing
- contributed by Bjorn Swift
- adds process accounting, for example counting the number of messages
sent, how often the process was preemted and how much time it spent
in the run queue. These statistics, along with the current cpu load,
are sent back to the user-space scheduler in the Out Of Quantum
message.
- the user-space scheduler may choose to make use of these statistics
when making scheduling decisions. For isntance the cpu load becomes
especially useful when scheduling on multiple cores.
- EBADCPU is returned is scheduler tries to run a process on a CPU
that either does not exist or isn't booted
- this change was originally meant to deal with stupid cpuid
instruction which provides totally useless information about
hyper-threading and MPS which does not deal with ht at all. ACPI
provides correct information. If ht is turned off it looks like some
CPUs failed to boot. Nevertheless this patch may be handy for
testing/benchmarking in the future.
- RTS_VMINHIBIT flag is used to stop process while VM is fiddling with
its pagetables
- more generic way of sending synchronous scheduling events among cpus
- do the x-cpu smp sched calls only if the target process is runnable.
If it is not, it cannot be running and it cannot become runnable
this CPU holds the BKL
- sys_schedule can change only selected values, -1 means that the
current value should be kept unchanged. For instance we mostly want
to change the scheduling quantum and priority but we want to keep
the process at the current cpu
- RS can hand off its processes to scheduler
- service can read the destination cpu from system.conf
- RS can pass the information farther
- machine information contains the number of cpus and the bsp id
- a dummy SMP scheduler which keeps all system processes on BSP and
all other process on APs. The scheduler remembers how many processes
are assigned to each CPU and always picks the one with the least
processes for a new process.
- kernel detects CPUs by searching ACPI tables for local apic nodes
- each CPU has its own TSS that points to its own stack. All cpus boot
on the same boot stack (in sequence) but switch to its private stack
as soon as they can.
- final booting code in main() placed in bsp_finish_booting() which is
executed only after the BSP switches to its final stack
- apic functions to send startup interrupts
- assembler functions to handle CPU features not needed for single cpu
mode like memory barries, HT detection etc.
- new files kernel/smp.[ch], kernel/arch/i386/arch_smp.c and
kernel/arch/i386/include/arch_smp.h
- 16-bit trampoline code for the APs. It is executed by each AP after
receiving startup IPIs it brings up the CPUs to 32bit mode and let
them spin in an infinite loop so they don't do any damage.
- implementation of kernel spinlock
- CONFIG_SMP and CONFIG_MAX_CPUS set by the build system
- most global variables carry information which is specific to the
local CPU and each CPU must have its own copy
- cpu local variable must be declared in cpulocal.h between
DECLARE_CPULOCAL_START and DECLARE_CPULOCAL_END markers using
DECLARE_CPULOCAL macro
- to access the cpu local data the provided macros must be used
get_cpu_var(cpu, name)
get_cpu_var_ptr(cpu, name)
get_cpulocal_var(name)
get_cpulocal_var_ptr(name)
- using this macros makes future changes in the implementation
possible
- switching to ELF will make the declaration of cpu local data much
simpler, e.g.
CPULOCAL int blah;
anywhere in the kernel source code
- 99% of the code is Intel's ACPICA. The license is compliant with BSD
and GNU and virtually all systems that use ACPI use this code, For
instance it is part of the Linux kernel.
- The only minix specific files are
acpi.c
osminixxf.c
platform/acminix.h
and
include/minix/acpi.h
- At the moment the driver does not register interrupt hooks which I
believe is mainly for handling PnP, events like "battery level is
low" and power management. Should not be difficult to add it if need
be.
- The interface to the outside world is virtually non-existent except
a trivial message based service for PCI driver to query which device
is connected to what IRQ line. This will evolve as more components
start using this driver. VM, Scheduler and IOMMU are the possible
users right now.
- because of dependency on a native 64bit (long long, part of c99) it
is compiled only with a gnu-like compilers which in case of Minix
includes gcc llvm-gcc and clang
- kernel exports DSDP (the root pointer where ACPI parsing starts) and
apic_enabled in the machine structure.
- ACPI driver uses DSDP to locate ACPI in memory. acpi_enabled tell
PCI driver to query ACPI for IRQ routing information.
This makes it easier to
- have non-base system drivers (get clobbered by global system.conf)
- have drivers as packages (can't touch global system.conf)
- make configs part of the drivers/servers instead of in global file
(makes system parts more self-contained)
- Remove unused includes.
- Add include guards to headers.
- Use unsigned variables in case they're never going to hold a negative
value. This causes GCC's complaints to disappear and should make flexelint
a lot happier, too.
- Make functions private when they're used only within a module.
- Remove unused variables.
- Add casts where appropriate.
- Currently the cpu time quantum is timer-ticks based. Thus the
remaining quantum is decreased only if the processes is interrupted
by a timer tick. As processes block a lot this typically does not
happen for normal user processes. Also the quantum depends on the
frequency of the timer.
- This change makes the quantum miliseconds based. Internally the
miliseconds are translated into cpu cycles. Everytime userspace
execution is interrupted by kernel the cycles just consumed by the
current process are deducted from the remaining quantum.
- It makes the quantum system timer frequency independent.
- The boot processes quantum is loosely derived from the tick-based
quantas and 60Hz timer and subject to future change
- the 64bit arithmetics is a little ugly, will be changes once we have
compiler support for 64bit integers (soon)
In this second phase, scheduling is moved from PM to its own
scheduler (see r6557 for phase one). In the next phase we hope to a)
include useful information in the "out of quantum" message and b)
create some simple scheduling policy that makes use of that
information.
When the system starts up, PM will iterate over its process table and
ask SCHED to take over scheduling unprivileged processes. This is
done by sending a SCHEDULING_START message to SCHED. This message
includes the processes endpoint, the parent's endpoint and its nice
level. The scheduler adds this process to its schedproc table, issues
a schedctl, and returns its own endpoint to PM - as the endpoint of
the effective scheduler. When a process terminates, a SCHEDULING_STOP
message is sent to the scheduler.
The reason for this effective endpoint is for future compatibility.
Some day, we may have a scheduler that, instead of scheduling the
process itself, forwards the SCHEDULING_START message on to another
scheduler.
PM has information on who schedules whom. As such, scheduling
messages from user-land are sent through PM. An example is when
processes change their priority, using nice(). In that case, a
getsetpriority message is sent to PM, which then sends a
SCHEDULING_SET_NICE to the process's effective scheduler.
When a process is forked through PM, it inherits its parent's
scheduler, but is spawned with an empty quantum. As before, a request
to fork a process flows through VM before returning to PM, which then
wakes up the child process. This flow has been modified slightly so
that PM notifies the scheduler of the new process, before waking up
the child process. If the scheduler fails to take over scheduling,
the child process is torn down and the fork fails with an erroneous
value.
Process priority is entirely decided upon using nice levels. PM
stores a copy of each process's nice level and when a child is
forked, its parent's nice level is sent in the SCHEDULING_START
message. How this level is mapped to a priority queue is up to the
scheduler. It should be noted that the nice level is used to
determine the max_priority and the parent could have been in a lower
priority when it was spawned. To prevent a CPU intensive process from
hawking the CPU by continuously forking children that get scheduled
in the max_priority, the scheduler should determine in which queue
the parent is currently scheduled, and schedule the child in that
same queue.
Other fixes: The USER_Q in kernel/proc.h was incorrectly defined as
NR_SCHED_QUEUES/2. That results in a "off by one" error when
converting priority->nice->priority for nice=0. This also had the
side effect that if someone were to set the MAX_USER_Q to something
else than 0, then USER_Q would be off.
model to an instance-based model. Each ethernet driver instance is now
responsible for exactly one network interface card. The port field in
/etc/inet.conf now acts as an instance field instead.
This patch also updates the data link protocol. This update:
- eliminates the concept of ports entirely;
- eliminates DL_GETNAME entirely;
- standardizes on using m_source for IPC and DL_ENDPT for safecopies;
- removes error codes from TASK/STAT replies, as they were unused;
- removes a number of other old or unused fields;
- names and renames a few other fields.
All ethernet drivers have been changed to:
- conform to the new protocol, and exactly that;
- take on an instance number based on a given "instance" argument;
- skip that number of PCI devices in probe iterations;
- use config tables and environment variables based on that number;
- no longer be limited to a predefined maximum of cards in any way;
- get rid of any leftover non-safecopy support and other ancient junk;
- have a correct banner protocol figure, or none at all.
Other changes:
* Inet.conf is now taken to be line-based, and supports #-comments.
No existing installations are expected to be affected by this.
* A new, select-based asynchio library replaces the old one.
Kindly contributed by Kees J. Bot.
* Inet now supports use of select() on IP devices.
Combined, the last two changes together speed up dhcpd
considerably in the presence of multiple interfaces.
* A small bug has been fixed in nonamed.
A new call to vm lets processes yield a part of their memory to vm,
together with an id, getting newly allocated memory in return. vm is
allowed to forget about it if it runs out of memory. processes can ask
for it back using the same id. (These two operations are normally
combined in a single call.)
It can be used as a as-big-as-memory-will-allow block cache for
filesystems, which is how mfs now uses it.
RS CHANGES:
- Crash recovery is now implemented like live update. Two instances are kept
side by side and the dead version is live updated into the new one. The endpoint
doesn't change and the failure is not exposed (by default) to other system
services.
- The new instance can be created reactively (when a crash is detected) or
proactively. In the latter case, RS can be instructed to keep a replica of
the system service to perform a hot swap when the service fails. The flag
SF_USE_REPL is set in that case.
- The new flag SF_USE_REPL is supported for services in the boot image and
dynamically started services through the RS interface (i.e. -p option in the
service utility).
- Fixed a free unallocated memory bug for core system services.
this patch changes the way pagefaults are delivered to VM. It adopts
the same model as the out-of-quantum messages sent by kernel to a
scheduler.
- everytime a userspace pagefault occurs, kernel creates a message
which is sent to VM on behalf of the faulting process
- the process is blocked on delivery to VM in the standard IPC code
instead of waiting in a spacial in-kernel queue (stack) and is not
runnable until VM tell kernel that the pagefault is resolved and is
free to clear the RTS_PAGEFAULT flag.
- VM does not need call kernel and poll the pagefault information
which saves many (1/2?) calls and kernel calls that return "no more
data"
- VM notification by kernel does not need to use signals
- each entry in proc table is by 12 bytes smaller (~3k save)
VFS CHANGES:
- dmap table no longer statically initialized in VFS
- Dropped FSSIGNON svrctl call no longer used by INET
INET CHANGES:
- INET announces its presence to VFS just like any other driver
RS CHANGES:
- The boot image dev table contains all the data to initialize VFS' dmap table
- RS interface supports asynchronous up and update operations now
- RS interface extended to support driver style and flags
SYSLIB CHANGES:
- DS calls to publish / retrieve labels consider endpoints instead of u32_t.
VFS CHANGES:
- mapdriver() only adds an entry in the dmap table in VFS.
- dev_up() is only executed upon reception of a driver up event.
INET CHANGES:
- INET no longer searches for existing drivers instances at startup.
- A newtwork driver is (re)initialized upon reception of a driver up event.
- Networking startup is now race-free by design. No need to waste 5 seconds
at startup any more.
DRIVER CHANGES:
- Every driver publishes driver up events when starting for the first time or
in case of restart when recovery actions must be taken in the upper layers.
- Driver up events are published by drivers through DS.
- For regular drivers, VFS is normally the only subscriber, but not necessarily.
For instance, when the filter driver is in use, it must subscribe to driver
up events to initiate recovery.
- For network drivers, inet is the only subscriber for now.
- Every VFS driver is statically linked with libdriver, every network driver
is statically linked with libnetdriver.
DRIVER LIBRARIES CHANGES:
- Libdriver is extended to provide generic receive() and ds_publish() interfaces
for VFS drivers.
- driver_receive() is a wrapper for sef_receive() also used in driver_task()
to discard spurious messages that were meant to be delivered to a previous
version of the driver.
- driver_receive_mq() is the same as driver_receive() but integrates support
for queued messages.
- driver_announce() publishes a driver up event for VFS drivers and marks
the driver as initialized and expecting a DEV_OPEN message.
- Libnetdriver is introduced to provide similar receive() and ds_publish()
interfaces for network drivers (netdriver_announce() and netdriver_receive()).
- Network drivers all support live update with no state transfer now.
KERNEL CHANGES:
- Added kernel call statectl for state management. Used by driver_announce() to
unblock eventual callers sendrecing to the driver.
reverse order to easily support variadic arguments. Thus, instead of
using the proper stdarg.h macros (that nowadays are
compiler-dependent), it may be tempting to directly take the address of
the last argument and considering it as the start of an array. This is
a shortcut that avoid looping to get all the arguments as the CPU
already pushed them on the stack before the call to the function.
Unfortunately, such an assumption is strictly compiler-dependent and
compilers are free to move the last argument on the stack, as a local
variable, and return the address of the location where the argument was
stored, if asked for. This will break things as the rest of the array's
argument are stored elsewhere (typically, a couple of words above the
location where the argument was stored).
This patch fixes the issue by allowing ACK to take the shortcut and
enabling gcc/llvm-gcc to follow the right way.
- IPC_FLG_MSG_FROM_KERNEL status flag is returned to userspace if the
receive was satisfied by s message which was sent by the kernel on
behalf of a process. This perfectly reliale information.
- MF_SENDING_FROM_KERNEL flag added to processes to be able to set
IPC_FLG_MSG_FROM_KERNEL when finishing receive if the receiver
wasn't ready to receive immediately.
- PM is changed to use this information to confirm that the scheduling
messages are indeed from the kernel and not faked by a process.
PM uses sef_receive_status()
- get_work() is removed from PM to make the changes simpler
- cotributed by Bjorn Swift
- In this first phase, scheduling is moved from the kernel to the PM
server. The next steps are to a) moving scheduling to its own server
and b) include useful information in the "out of quantum" message,
so that the scheduler can make use of this information.
- The kernel process table now keeps record of who is responsible for
scheduling each process (p_scheduler). When this pointer is NULL,
the process will be scheduled by the kernel. If such a process runs
out of quantum, the kernel will simply renew its quantum an requeue
it.
- When PM loads, it will take over scheduling of all running
processes, except system processes, using sys_schedctl().
Essentially, this only results in taking over init. As children
inherit a scheduler from their parent, user space programs forked by
init will inherit PM (for now) as their scheduler.
- Once a process has been assigned a scheduler, and runs out of
quantum, its RTS_NO_QUANTUM flag will be set and the process
dequeued. The kernel will send a message to the scheduler, on the
process' behalf, informing the scheduler that it has run out of
quantum. The scheduler can take what ever action it pleases, based
on its policy, and then reschedule the process using the
sys_schedule() system call.
- Balance queues does not work as before. While the old in-kernel
function used to renew the quantum of processes in the highest
priority run queue, the user-space implementation only acts on
processes that have been bumped down to a lower priority queue.
This approach reacts slower to changes than the old one, but saves
us sending a sys_schedule message for each process every time we
balance the queues. Currently, when processes are moved up a
priority queue, their quantum is also renewed, but this can be
fiddled with.
- do_nice has been removed from kernel. PM answers to get- and
setpriority calls, updates it's own nice variable as well as the
max_run_queue. This will be refactored once scheduling is moved to a
separate server. We will probably have PM update it's local nice
value and then send a message to whoever is scheduling the process.
- changes to fix an issue in do_fork() where processes could run out
of quantum but bypassing the code path that handles it correctly.
The future plan is to remove the policy from do_fork() and implement
it in userspace too.
struct return. For example, GCC and LLVM comply with this (tested on IA32).
ACK doesn't seem to follow this convention and expects the caller to clean up
the stack. Compiling hand-written ACK-compliant assembly code (returning a
struct) with GCC or LLVM used to break things (4-bytes misaligned stack).
The patch fixes this problem.
IPC changes:
- receive() is changed to take an additional parameter, which is a pointer to
a status code.
- The status code is filled in by the kernel to provide additional information
to the caller. For now, the kernel only fills in the IPC call used by the
sender.
Syslib changes:
- sef_receive() has been split into sef_receive() (with the original semantics)
and sef_receive_status() which exposes the status code to userland.
- Ideally, every sys process should gradually switch to sef_receive_status()
and use is_ipc_notify() as a dependable way to check for notify.
- SEF has been modified to use is_ipc_notify() and demonstrate how to use the
new status code.
- before enabling paging VM asks kernel to resize its segments. This
may cause kernel to segfault if APIC is used and an interrupt
happens between this and paging enabled. As these are 2 separate
vmctl calls it is not atomic. This patch fixes this problem. VM does
not ask kernel to resize the segments in a separate call anymore.
The new segments limit is part of the "enable paging" call. It
generalizes this call in such a way that more information can be
passed as need be or the information may be completely different if
another architecture requires this.
UPDATING INFO:
20100317:
/usr/src/etc/system.conf updated to ignore default kernel calls: copy
it (or merge it) to /etc/system.conf.
The hello driver (/dev/hello) added to the distribution:
# cd /usr/src/commands/scripts && make clean install
# cd /dev && MAKEDEV hello
KERNEL CHANGES:
- Generic signal handling support. The kernel no longer assumes PM as a signal
manager for every process. The signal manager of a given process can now be
specified in its privilege slot. When a signal has to be delivered, the kernel
performs the lookup and forwards the signal to the appropriate signal manager.
PM is the default signal manager for user processes, RS is the default signal
manager for system processes. To enable ptrace()ing for system processes, it
is sufficient to change the default signal manager to PM. This will temporarily
disable crash recovery, though.
- sys_exit() is now split into sys_exit() (i.e. exit() for system processes,
which generates a self-termination signal), and sys_clear() (i.e. used by PM
to ask the kernel to clear a process slot when a process exits).
- Added a new kernel call (i.e. sys_update()) to swap two process slots and
implement live update.
PM CHANGES:
- Posix signal handling is no longer allowed for system processes. System
signals are split into two fixed categories: termination and non-termination
signals. When a non-termination signaled is processed, PM transforms the signal
into an IPC message and delivers the message to the system process. When a
termination signal is processed, PM terminates the process.
- PM no longer assumes itself as the signal manager for system processes. It now
makes sure that every system signal goes through the kernel before being
actually processes. The kernel will then dispatch the signal to the appropriate
signal manager which may or may not be PM.
SYSLIB CHANGES:
- Simplified SEF init and LU callbacks.
- Added additional predefined SEF callbacks to debug crash recovery and
live update.
- Fixed a temporary ack in the SEF init protocol. SEF init reply is now
completely synchronous.
- Added SEF signal event type to provide a uniform interface for system
processes to deal with signals. A sef_cb_signal_handler() callback is
available for system processes to handle every received signal. A
sef_cb_signal_manager() callback is used by signal managers to process
system signals on behalf of the kernel.
- Fixed a few bugs with memory mapping and DS.
VM CHANGES:
- Page faults and memory requests coming from the kernel are now implemented
using signals.
- Added a new VM call to swap two process slots and implement live update.
- The call is used by RS at update time and in turn invokes the kernel call
sys_update().
RS CHANGES:
- RS has been reworked with a better functional decomposition.
- Better kernel call masks. com.h now defines the set of very basic kernel calls
every system service is allowed to use. This makes system.conf simpler and
easier to maintain. In addition, this guarantees a higher level of isolation
for system libraries that use one or more kernel calls internally (e.g. printf).
- RS is the default signal manager for system processes. By default, RS
intercepts every signal delivered to every system process. This makes crash
recovery possible before bringing PM and friends in the loop.
- RS now supports fast rollback when something goes wrong while initializing
the new version during a live update.
- Live update is now implemented by keeping the two versions side-by-side and
swapping the process slots when the old version is ready to update.
- Crash recovery is now implemented by keeping the two versions side-by-side
and cleaning up the old version only when the recovery process is complete.
DS CHANGES:
- Fixed a bug when the process doing ds_publish() or ds_delete() is not known
by DS.
- Fixed the completely broken support for strings. String publishing is now
implemented in the system library and simply wraps publishing of memory ranges.
Ideally, we should adopt a similar approach for other data types as well.
- Test suite fixed.
DRIVER CHANGES:
- The hello driver has been added to the Minix distribution to demonstrate basic
live update and crash recovery functionalities.
- Other drivers have been adapted to conform the new SEF interface.
swapcontext, and makecontext).
- Fix VM to not erroneously think the stack segment and data segment have
collided when a user-space thread invokes brk().
- Add test51 to test ucontext functionality.
- Add man pages for ucontext system calls.
implementations for these functions, we lean on GNU builtin functions
for using them, so these declarations are also conditional on using
a GNU compiler.
Move archtypes.h to include/ dir, since several servers require it. Move
fpu.h and stackframe.h to arch-specific header directory. Make source
files and makefiles aware of the new header locations.
-Convert the include directory over to using bsdmake
syntax
-Update/add mkfiles
-Modify install(1) so that it can create symlinks
-Update makefiles to use new install(1) options
-Rename /usr/include/ibm to /usr/include/i386
-Create /usr/include/machine symlink to arch header files
-Move vm_i386.h to its new home in the /usr/include/i386
-Update source files to #include the header files at their
new homes.
-Add new gnu-includes target for building GCC headers
this change
- makes panic() variadic, doing full printf() formatting -
no more NO_NUM, and no more separate printf() statements
needed to print extra info (or something in hex) before panicing
- unifies panic() - same panic() name and usage for everyone -
vm, kernel and rest have different names/syntax currently
in order to implement their own luxuries, but no longer
- throws out the 1st argument, to make source less noisy.
the panic() in syslib retrieves the server name from the kernel
so it should be clear enough who is panicing; e.g.
panic("sigaction failed: %d", errno);
looks like:
at_wini(73130): panic: sigaction failed: 0
syslib:panic.c: stacktrace: 0x74dc 0x2025 0x100a
- throws out report() - printf() is more convenient and powerful
- harmonizes/fixes the use of panic() - there were a few places
that used printf-style formatting (didn't work) and newlines
(messes up the formatting) in panic()
- throws out a few per-server panic() functions
- cleans up a tie-in of tty with panic()
merging printf() and panic() statements to be done incrementally.
have malloc/free, alloc_contig/free_contig and mmap/munmap nicely
paired up.
memory uses malloc/free instead of mmap/munmap as it doesn't have
to be contiguous for the ramdisks (and it might help if it doesn't!).
- there are no tasks running, we don't need TASK_PRIVILEGE priviledge anymore
- as there is no ring 1 anymore, there is no need for level0() to call sensitive
code from ring 1 in ring 0
- 286 related macros removed as clean up
* Userspace change to use the new kernel calls
- _taskcall(SYSTASK...) changed to _kernel_call(...)
- int 32 reused for the kernel calls
- _do_kernel_call() to make the trap to kernel
- kernel_call() to make the actuall kernel call from C using
_do_kernel_call()
- unlike ipc call the kernel call always succeeds as kernel is
always available, however, kernel may return an error
* Kernel side implementation of kernel calls
- the SYSTEm task does not run, only the proc table entry is
preserved
- every data_copy(SYSTEM is no data_copy(KERNEL
- "locking" is an empty operation now as everything runs in
kernel
- sys_task() is replaced by kernel_call() which copies the
message into kernel, dispatches the call to its handler and
finishes by either copying the results back to userspace (if
need be) or by suspending the process because of VM
- suspended processes are later made runnable once the memory
issue is resolved, picked up by the scheduler and only at
this time the call is resumed (in fact restarted) which does
not need to copy the message from userspace as the message
is already saved in the process structure.
- no ned for the vmrestart queue, the scheduler will restart
the system calls
- no special case in do_vmctl(), all requests remove the
RTS_VMREQUEST flag
- Make open(2) more POSIX compliant
- Add a test case for dangling symlinks and open() syscall with O_CREAT and
O_EXCL on a symlink.
- Update open(2) man page to reflect change.