complete munmap implementation; single-page references made
a general munmap() implementation possible to write cleanly.
. memory: let the MIOCRAMSIZE ioctl set the imgrd device
size (but only to 0)
. let the ramdisk command set sizes to 0
. use this command to set /dev/imgrd to 0 after mounting /usr
in /etc/rc, so the boot time ramdisk is freed (about 4MB
currently)
This patch adds the sprofdiff tool, which compares two sets of profiling
output files. It sorts processes and symbols by difference in average
number of samples, placing those that took more time on the left first
and those that took more time on the right last. If multiple runs are
combined, a standard deviation is computed and this is used to compute
the significance level, which gives an indication of which differences
are likely to be due to chance.
This tool is run not on the raw profiling files, but on the output of
sprofalyze -d (a new option). Though having to use two tools and an
intermediate file seems a bit awkward, the advantage is that the
original source tree is not needed to resolve the symbols. For
comparisons, this is very useful. Also, the intermediate file is in a
text format that can easily be processed by scripts, which may be useful
for other purposes as well.
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.
There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.
No static pre-allocated memory sizes exist any more.
Changes to booting:
. The pre_init.c leaves the kernel and modules exactly as
they were left by the bootloader in physical memory
. The kernel starts running using physical addressing,
loaded at a fixed location given in its linker script by the
bootloader. All code and data in this phase are linked to
this fixed low location.
. It makes a bootstrap pagetable to map itself to a
fixed high location (also in linker script) and jumps to
the high address. All code and data then use this high addressing.
. All code/data symbols linked at the low addresses is prefixed by
an objcopy step with __k_unpaged_*, so that that code cannot
reference highly-linked symbols (which aren't valid yet) or vice
versa (symbols that aren't valid any more).
. The two addressing modes are separated in the linker script by
collecting the unpaged_*.o objects and linking them with low
addresses, and linking the rest high. Some objects are linked
twice, once low and once high.
. The bootstrap phase passes a lot of information (e.g. free memory
list, physical location of the modules, etc.) using the kinfo
struct.
. After this bootstrap the low-linked part is freed.
. The kernel maps in VM into the bootstrap page table so that VM can
begin executing. Its first job is to make page tables for all other
boot processes. So VM runs before RS, and RS gets a fully dynamic,
VM-managed address space. VM gets its privilege info from RS as usual
but that happens after RS starts running.
. Both the kernel loading VM and VM organizing boot processes happen
using the libexec logic. This removes the last reason for VM to
still know much about exec() and vm/exec.c is gone.
Further Implementation:
. All segments are based at 0 and have a 4 GB limit.
. The kernel is mapped in at the top of the virtual address
space so as not to constrain the user processes.
. Processes do not use segments from the LDT at all; there are
no segments in the LDT any more, so no LLDT is needed.
. The Minix segments T/D/S are gone and so none of the
user-space or in-kernel copy functions use them. The copy
functions use a process endpoint of NONE to realize it's
a physical address, virtual otherwise.
. The umap call only makes sense to translate a virtual address
to a physical address now.
. Segments-related calls like newmap and alloc_segments are gone.
. All segments-related translation in VM is gone (vir2map etc).
. Initialization in VM is simpler as no moving around is necessary.
. VM and all other boot processes can be linked wherever they wish
and will be mapped in at the right location by the kernel and VM
respectively.
Other changes:
. The multiboot code is less special: it does not use mb_print
for its diagnostics any more but uses printf() as normal, saving
the output into the diagnostics buffer, only printing to the
screen using the direct print functions if a panic() occurs.
. The multiboot code uses the flexible 'free memory map list'
style to receive the list of free memory if available.
. The kernel determines the memory layout of the processes to
a degree: it tells VM where the kernel starts and ends and
where the kernel wants the top of the process to be. VM then
uses this entire range, i.e. the stack is right at the top,
and mmap()ped bits of memory are placed below that downwards,
and the break grows upwards.
Other Consequences:
. Every process gets its own page table as address spaces
can't be separated any more by segments.
. As all segments are 0-based, there is no distinction between
virtual and linear addresses, nor between userspace and
kernel addresses.
. Less work is done when context switching, leading to a net
performance increase. (8% faster on my machine for 'make servers'.)
. The layout and configuration of the GDT makes sysenter and syscall
possible.
* Display an error message upon failure to mount a device.
* Handle a special case when the source device is "none"
* pass the mount options stored in fourth field of fstab
to mount(3).
The rc script manually parses /etc/fstab to mount all file systems.
To do that it needs /bin/sed which does not exist anymore. mount(8)
now supports the -a flag which causes it to mount all file systems
listed in /etc/fstab except for '/'. File systems marked with 'noauto'
are skipped.
. new mode for sys_memset: include process so memset can be
done in physical or virtual address space.
. add a mode to mmap() that lets a process allocate uninitialized
memory.
. this allows an exec()er (RS, VFS, etc.) to request uninitialized
memory from VM and selectively clear the ranges that don't come
from a file, leaving no uninitialized memory left for the process
to see.
. use callbacks for clearing the process, clearing memory in the
process, and copying into the process; so that the libexec code
can be used from rs, vfs, and in the future, kernel (to load vm)
and vm (to load boot-time processes)
these two functions will be used to support all exec() functionality
going into a single library shared by RS and VFS and exec() knowledge
leaving VM.
. third-party mmap: allow certain processes (VFS, RS) to
do mmap() on behalf of another process
. PROCCTL: used to free and clear a process' address space
. make ramdisk buildable without ../etc having pwd.db
. add cat to release bootstrap cmds
. support running dynamically linked executables for
release bootstrap cmds
. import netbsd chroot to help
See UPDATING about upgrading clang for dynamic linking.
. allow executables on ramdisk to be dynamically linked; this means
putting a few required shared libraries and ld.elf_so on the ramdisk.
. this makes the ramdisk (usage) smaller when they are dynamic, but
bigger when they're not.
. also we can safely ditch newroot and call mount directly as that is
all newroot does.
. create proto.common to share a bunch of entries between
small/nonsmall cases
remove some old minix-userland-specific stuff
. /etc/ttytab as a file, and minix-compat function (fftyslot()),
replaced by /etc/ttys and new libc functions
. also remove minix-specific nlist(), cuserid(), fttyslot(), v8 regex
functions and <compat/regex.h>
. and remaining minix-only utilities that use them
. also unused <compat/pwd.h> and <compat/syslog.h> and
redundant <sys/sigcontext.h>
There is important information about booting non-ack images in
docs/UPDATING. ack/aout-format images can't be built any more, and
booting clang/ELF-format ones is a little different. Updating to the
new boot monitor is recommended.
Changes in this commit:
. drop boot monitor -> allowing dropping ack support
. facility to copy ELF boot files to /boot so that old boot monitor
can still boot fairly easily, see UPDATING
. no more ack-format libraries -> single-case libraries
. some cleanup of OBJECT_FMT, COMPILER_TYPE, etc cases
. drop several ack toolchain commands, but not all support
commands (e.g. aal is gone but acksize is not yet).
. a few libc files moved to netbsd libc dir
. new /bin/date as minix date used code in libc/
. test compile fix
. harmonize includes
. /usr/lib is no longer special: without ack, /usr/lib plays no
kind of special bootstrapping role any more and bootstrapping
is done exclusively through packages, so releases depend even
less on the state of the machine making them now.
. rename nbsd_lib* to lib*
. reduce mtree
With -n -b file, a.out boot images can be used for CD booting;
with the new -n -B file option, plain binary (like bootxx_cd9660)
can be used instead.
Restore working the -h and -f options while there.
And add a new -F option for 2.8MB floppy image.
Register file timestamps
Remember the path tables in the primary descriptor
Put the size of the parent directory in the \1 entry, not own size
Allow the use of -b option without -a
Notes:
* Still missing the man page
* Filenames are still trimmed to 12 characters, because of
8.3 MS-DOS inherited compatibility (ISO9660 level 1);
also note that 7.4 or 9.2 filenames are accepted though
* Final . at end of filenames without extension is still missing
* VMS-compatible ;1 version suffix is still omitted
* Limit of 65,535 directories in path tables is not checked
. also implement now-possible fsck -p option
. allows unconditional fsck -p invocation at startup,
only checking each filesystem if not marked clean
. mounting unclean is allowed but is forced readonly
. updating the superblock while mounted is now not
allowed by mfs - must be done (e.g. by fsck.mfs)
on an unmounted fs
. clean flag is unset by mfs on mounting, and set by
mfs on clean unmounting (if clean flag was set at
mount time)
Signed-off-by: Ben Gras <ben@minix3.org>
This driver can be loaded as an overlay on top of a real block
device, and can then be used to generate block-level failures for
certain transfer requests. Specifically, a rule-based system allows
the user to introduce (overt and silent) data corruption and errors.
It exposes itself through /dev/fbd, and a file system can be mounted
on top of it. The new fbdctl(8) tool can be used to control the
driver; see ``man fbdctl'' for details. It also comes with a test
set, located in test/fbdtest.
The implementation is in libblockdriver, and works transparently for
all block drivers. The new btrace(8) tool can be used to control block
tracing; see ``man btrace'' for details.
. rc script and service know to look in /usr/pkg/.. for
extra binaries and conf files
. service split into parsing config and doing RS request
so that a new utility (printconfig) can just print the
config in machine-parseable format for netconf integration
. converted all base system eth drivers/netconf
. detect both formats in /etc/rc
. generate new format in setup
. obsoletes /etc/fstab.local: everything can go in /etc/fstab
. put shutdown/reboot/halt and a copy of /usr/adm/wtmp
(/etc/wtmp) on root FS so that we can do shutdown checks before
mounting /usr
. new fstab format makes getfsent() and friends work
Import libpuffs and our port of libpuffs. The port was done as part of
GSoC 2011 FUSE project, done by Evgeniy Ivanov. The librefuse import
did not require any porting efforts. Libpuffs has been modified to
understand our VFS-FS protocol and translate between that and PUFFS. As
an example that it works, fuse-ntfs-3g from pkgsrc can be compiled and
used to mount ntfs partitions:
mount -t ntfs-3g <device> <mountpoint>
FUSE only works with the asynchronous version of VFS. See <docs/UPDATING> on
how to run AVFS.
This patch further includes some changes to mount(1) and mount(2) so it's
possible to use file systems provided by pkgsrc (note: manual modifications
to /etc/system.conf are still needed. There has been made an exception for
fuse-ntfs-3g, so it already as an entry).
. add bsd-style MLINKS to minix man set, restoring aliases
(e.g. man add64 -> int64)
. update daily cron script to run makewhatis and restore makewhatis
in man Makefile (makedb), restores functionality of man -k
. netbsd imports of man, mdocml, makewhatis, libutil, apropos
. update man.conf with manpage locations, restoring man [-s] <section>
. throws out some obsolete manpages
. move mfs-specific struct, constants to mfs/, so
mfs-specific, on-disk format structs and consts are
fully isolated from generic structs and functions
. removes de and readfs utils
Let's suppose that /usr/tmp exists and one wants /usr/tmp/a/b
If one runs "mkdir -p /usr/tmp/a/b/" (the '/' at the end is
important), then a "File exists" error comes up. Example:
$ rm -rf /usr/tmp/a
$ mkdir -p /usr/tmp/a/b/
/usr/tmp/a/b/: File exists
This breaks gcc47 installation when C++ is enabled, and this
isn't the behaviour of mkdir on NetBSD nor Linix.
This patch fixes the above issue by dropping the trailing '/'.
. ipc wants to know about processes that get
signals, so that it can break blocking ipc operations
. doing it for every single signal is wasteful
and causes the annoying 'no slot for signals' message
. this fix tells vm on a per-process basis it (ipc)
wants to be notified, i.e. only when it does any ipc calls
. move ipc config to separate config file while we're at it
The bsd signal names are out-of-order compared to the minix ones.
I found out (the hard way) that the (MINIX-descending) ordered list of
signals in <sys/signal.h> does not match the (BSD-descending) ordered
list of signals in usr/src/lib/libc/nbsd_libc/gen/sig{name,list}.c
Beyond being unfortunate, it prevents the trap command of ash to handle
correctly a named signal; a funny test case is
#!/bin/sh
trap 'echo trapping signal BUS' BUS
trap 'echo trapping signal 10 (USR1)' 10
trap # show me what is currently trapped
As a quick workaround, I disabled the use of the libc-provided
sys_sig{name,list} arrays for ash, and reverted to the hand-made array
which is used by the less capable MINIX libc. It allowed me to use
pkgsrc.
. don't install minix <termcap.h> as libterminfo
has its own (but still install it in /usr/include.ack)
. forget minix termcap functions in -lcompat_minix
. make commands use -lterminfo in netbsd libc compile mode
. speeds up mkdep (i.e. world builds) significantly
. have to keep minix /bin/sed for a while because previous
usr/etc/rc depends on it
. force mkdep to use /usr/bin/sed for speedup
. it's a good extra interface to have but doesn't
meet standardised functionality
. applications (in pkgsrc) find it and expect
full functionality the minix mmap doesn't offter
. on the whole probably better to hide these functions
(mmap and friends) until they are grown up; the base system
can use the new minix_* names
. strerror() assumes this
. remove generated libminc/errlist.c
. errno's in <sys/errno.h> have to be in sorted order
. filtering out some errno.h in Makefile lets us use near-stock
errlist.awk
* VFS and installed MFSes must be in sync before and after this change *
Use struct stat from NetBSD. It requires adding new STAT, FSTAT and LSTAT
syscalls. Libc modification is both backward and forward compatible.
Also new struct stat uses modern field sizes to avoid ABI
incompatibility, when we update uid_t, gid_t and company.
Exceptions are ino_t and off_t in old libc (though paddings added).
1. ack, a.out, minix headers (moved to /usr/include.ack),
minix libc
2. gcc/clang, elf, netbsd headers (moved to /usr/include),
netbsd libc (moved to /usr/lib)
So this obsoletes the /usr/netbsd hierarchy.
No special invocation for netbsd libc necessary - it's always used
for gcc/clang.
3 sets of libraries are built now:
. ack: all libraries that ack can compile (/usr/lib/i386/)
. clang+elf: all libraries with minix headers (/usr/lib/)
. clang+elf: all libraries with netbsd headers (/usr/netbsd/)
Once everything can be compiled with netbsd libraries and headers, the
/usr/netbsd hierarchy will be obsolete and its libraries compiled with
netbsd headers will be installed in /usr/lib, and its headers
in /usr/include. (i.e. minix libc and current minix headers set
will be gone.)
To use the NetBSD libc system (libraries + headers) before
it is the default libc, see:
http://wiki.minix3.org/en/DevelopersGuide/UsingNetBSDCode
This wiki page also documents the maintenance of the patch
files of minix-specific changes to imported NetBSD code.
Changes in this commit:
. libsys: Add NBSD compilation and create a safe NBSD-based libc.
. Port rest of libraries (except libddekit) to new header system.
. Enable compilation of libddekit with new headers.
. Enable kernel compilation with new headers.
. Enable drivers compilation with new headers.
. Port legacy commands to new headers and libc.
. Port servers to new headers.
. Add <sys/sigcontext.h> in compat library.
. Remove dependency file in tree.
. Enable compilation of common/lib/libc/atomic in libsys
. Do not generate RCSID strings in libc.
. Temporarily disable zoneinfo as they are incompatible with NetBSD format
. obj-nbsd for .gitignore
. Procfs: use only integer arithmetic. (Antoine Leca)
. Increase ramdisk size to create NBSD-based images.
. Remove INCSYMLINKS handling hack.
. Add nbsd_include/sys/exec_elf.h
. Enable ELF compilation with NBSD libc.
. Add 'make nbsdsrc' in tools to download reference NetBSD sources.
. Automate minix-port.patch creation.
. Avoid using fstavfs() as it is *extremely* slow and unneeded.
. Set err() as PRIVATE to avoid name clash with libc.
. [NBSD] servers/vm: remove compilation warnings.
. u32 is not a long in NBSD headers.
. UPDATING info on netbsd hierarchy
. commands fixes for netbsd libc
sys_umap now supports only:
- looking up the physical address of a virtual address in the address space
of the caller;
- looking up the physical address of a grant for which the caller is the
grantee.
This is enough for nearly all umap users. The new sys_umap_remote supports
lookups in arbitrary address spaces and grants for arbitrary grantees.
and minor fixes:
. add ack/clean target to lib, 'unify' clean target
. add includes as library dependency
. mk: exclude warning options clang doesn't have in non-gcc
. set -e in lib/*.sh build files
. clang compile error circumvention (disable NOASSERTS for release builds)
A sort of quick hack for dhcpd to work as a client with lwip server.
- The functionality is not changed unless --lwip switch is supplied.
dhcpd does not use broadcast udp sockets but some sort of raw
sockets and changes their behavior during their life by ioctls.
- I thought there is no need to polute lwip just to make dhcp client
work. Instead I decided to twist the client a little bit.
- It is so far the only big collision I found between inet and lwip.
pkgsrc binary packages.
rationale:
. pkg_install (which is the pkg_* tools) is entangled with pkgsrc,
not with minix, so tracking it from pkgsrc (easier than with
base system) makes more sense
. simplifies upstreaming minix specific changes for pkg_* tools
. reduce pkgsrc-in-basesystem maintenance burden
- profile --nmi | --rtc sets the profiling mode
- --rtc is default, uses BIOS RTC, cannot profile kernel the presetted
frequency values apply
- --nmi is only available in APIC mode as it uses the NMI watchdog, -f
allows any frequency in Hz
- both modes use compatible data structures
- when kernel profiles a process for the first time it saves an entry
describing the process [endpoint|name]
- every profile sample is only [endpoint|pc]
- profile utility creates a table of endpoint <-> name relations and
translates endpoints of samples into names and writing out the
results to comply with the processing tools
- "task" endpoints like KERNEL are negative thus we must cast it to
unsigned when hashing
- contributed by Bjorn Swift
- adds process accounting, for example counting the number of messages
sent, how often the process was preemted and how much time it spent
in the run queue. These statistics, along with the current cpu load,
are sent back to the user-space scheduler in the Out Of Quantum
message.
- the user-space scheduler may choose to make use of these statistics
when making scheduling decisions. For isntance the cpu load becomes
especially useful when scheduling on multiple cores.
This makes it easier to
- have non-base system drivers (get clobbered by global system.conf)
- have drivers as packages (can't touch global system.conf)
- make configs part of the drivers/servers instead of in global file
(makes system parts more self-contained)
this is to force invocations of these utils for ack to be
explicitly named such, so in the future binutils can be installed
in /usr/pkg without the g- prefix.