Change the kernel to add features to vircopy and safecopies so that
transparent copy fixing won't happen to avoid deadlocks, and such copies
fail with EFAULT.
Transparently making copying work from filesystems (as normally done by
the kernel & VM when copying fails because of missing/readonly memory)
is problematic as it can happen that, for file-mapped ranges, that that
same filesystem that is blocked on the copy request is needed to satisfy
the memory range, leading to deadlock. Dito for VFS itself, if done with
a blocking call.
This change makes the copying done from a filesystem fail in such cases
with EFAULT by VFS adding the CPF_TRY flag to the grants. If a FS call
fails with EFAULT, VFS will then request the range to be made available
to VM after the FS is unblocked, allowing it to be used to satisfy the
range if need be in another VFS thread.
Similarly, for datacopies that VFS itself does, it uses the failable
vircopy variant and callers use a wrapper that talk to VM if necessary
to get the copy to work.
. kernel: add CPF_TRY flag to safecopies
. kernel: only request writable ranges to VM for the
target buffer when copying fails
. do copying in VFS TRY-first
. some fixes in VM to build SANITYCHECK mode
. add regression test for the cases where
- a FS system call needs memory mapped in a process that the
FS itself must map.
- such a range covers more than one file-mapped region.
. add 'try' mode to vircopy, physcopy
. add flags field to copy kernel call messages
. if CP_FLAG_TRY is set, do not transparently try
to fix memory ranges
. for use by VFS when accessing user buffers to avoid
deadlock
. remove some obsolete backwards compatability assignments
. VFS: let thread scheduling work for VM requests too
Allows VFS to make calls to VM while suspending and resuming
the currently running thread. Does currently not work for the
main thread.
. VM: add fix memory range call for use by VFS
Change-Id: I295794269cea51a3163519a9cfe5901301d90b32
. use netbsd sigframe, sigcontext struct
. netbsd sigframe *contains* sigcontext; use that directly
in kernel sigsend
. drop two fields from minix x86 stackframe.h (process context)
that were unused, retadr and st
use in-sigframe sigcontext
Change-Id: Ib59d699596dc3a78163dee59f19730482fdddf11
. create signals-related struct message type to store sigset_t
directly
. create notify-specific message types, so the generic NOTIFY_ARG
doesn't exist anymore
. various related test expansions, improvements, fixes
. add a few error-checks to sigismember() calls
. rename kernel call specific signals fields to SYS_*
Change-Id: I53c18999b5eaf0cfa0cb25f5330bee9e7ad2b478
The set of processes to which a SIGKMESS signal is sent whenever new
diagnostics messages are added to the kernel's message buffer, is now
no longer hardcoded. Instead, processes can (un)register themselves
to receive such notifications, by means of sys_diagctl().
Change-Id: I9d6ac006a5d9bbfad2757587a068fc1ec3cc083e
* Renamed struct timer to struct minix_timer
* Renamed timer_t to minix_timer_t
* Ensured all the code uses the minix_timer_t typedef
* Removed ifdef around _BSD_TIMER_T
* Removed include/timers.h and merged it into include/minix/timers.h
* Resolved prototype conflict by renaming kernel's (re)set_timer
to (re)set_kernel_timer.
Change-Id: I56f0f30dfed96e1a0575d92492294cf9a06468a5
* Removed startup code patches in lib/csu regarding kernel to userland
ABI.
* Aligned stack layout on NetBSD stack layout.
* Generate valid stack pointers instead of offsets by taking into account
_minix_kerninfo->kinfo->user_sp.
* Refactored stack generation, by moving part of execve in two
functions {minix_stack_params(), minix_stack_fill()} and using them
in execve(), rs and vm.
* Changed load offset of rtld (ld.so) to:
execi.args.stack_high - execi.args.stack_size - 0xa00000
which is 10MB below the main executable stack.
Change-Id: I839daf3de43321cded44105634102d419cb36cec
The following types are modified (old -> new):
* _BSD_USECONDS_T_ int -> unsigned int
* __socklen_t __int32_t -> __uint32_t
* blksize_t uint32_t -> int32_t
* rlim_t uint32_t -> uint64_t
On ARM:
* _BSD_CLOCK_T_ int -> unsigned int
On Intel:
* _BSD_CLOCK_T_ int -> unsigned long
bin/cat is also updated in order to fix warnings.
_BSD_TIMER_T_ has still to be aligned.
Change-Id: I2b4fda024125a19901120546c4e22e443ba5e9d7
clock_t is currently a signed type, but in NetBSD this is not the
case. As we plan on aligning our types we have to change this as this
prevents negative delta from being correctly used.
Change-Id: I9bccdee2b41626b0262471dc1900de505a1991a7
padconf is specific to arm, so it's being moved to kernel/arch/earm.
Add a test case to ensure the proper error is returned on non-ARM
systems.
Change-Id: I07ebbe64825d59bc0ef9c818d3d54891dafb4419
On the AM335X, writes to the padconf registers must be done in privileged
mode. To allow userspace drivers to dynamically change the padconf at
runtime, a kernel call has been added.
Change-Id: I4b25d2879399b1785a360912faa0e90b5c258533
. Replace 64bit funcions with operators in arch_clock.c
. Replace 64bit funcions with operators in proc.c
. Replace 64bit funcions with operators in vbox.c
. Replace 64bit funcions with operators in driver.c
. Eradicates is_zero64, make_zero64, neg64
Change-Id: Ie4e1242a73534f114725271b2e2365b2004cb7b9
Implement getrusage.
These fields of struct rusage are not supported and always set to zero at this time
long ru_nswap; /* swaps */
long ru_inblock; /* block input operations */
long ru_oublock; /* block output operations */
long ru_msgsnd; /* messages sent */
long ru_msgrcv; /* messages received */
long ru_nvcsw; /* voluntary context switches */
long ru_nivcsw; /* involuntary context switches */
test75.c is the unit test for this new function
Change-Id: I3f1eb69de1fce90d087d76773b09021fc6106539
. with hz=1000, clock_t only lasts a few years.
whenever we can't express the desired realtime
in ticks because the distance with boottime is
too high, simply adjust bootime like we do for
otherwise negative values.
. fixes test 2 on ARM
This also adds the sys_settime() kernel call which allows for the adjusting
of the clock named realtime in the kernel. The existing sys_stime()
function is still needed for a separate job (setting the boottime). The
boottime is set in the readclock driver. The sys_settime() interface is
meant to be flexible and will support both clock_settime() and adjtime()
when adjtime() is implemented later.
settimeofday() was adjusted to use the clock_settime() interface.
One side note discovered during testing: uptime(1) (part of the last(1)),
uses wtmp to determine boottime (not Minix's times(2)). This leads `uptime`
to report odd results when you set the time to a time prior to boottime.
This isn't a new bug introduced by my changes. It's been there for a while.
In order to make it more clear that ticks should be used for timers
and realtime should be used for timestamps / displaying the date/time,
getuptime() was renamed to getticks() and getuptime2() was renamed to
getuptime().
Servers, drivers, libraries, tests, etc that use getuptime()/getuptime2()
have been updated. In instances where a realtime was calculated, the
calculation was changed to use realtime.
System calls clock_getres() and clock_gettime() were added to PM/libc.
Old realtime was used for both timers (where an accurate count of
all ticks is needed) and the system time. In order to implement
adjtime(2), these duties must be separated as changing the time
of day by a small amount shouldn't affect timers in any way nor
should it change the boot time.
Following the naming of the clocks used by clock_gettime(2). The
clock named 'realtime' will represent the best guess at the
current wall clock time, and the clock named 'monotonic' will
represent the absolute time the system has been running.
Use monotonic for timers in kernel and in drivers. Use realtime
for determining time of day, dates, etc.
This commit simply renames realtime to monotonic and adds a new
tick counter named realtime. There are no functional changes in
this commit. It just lays the foundation for future work.
. set MF_CONTEXT_SET after signal handler state
is set so it doesn't get clobbered by the kernel
afterwards (i.e. by delivermsg()).
fixes at least test41.
Change-Id: I7e5e0e9311c8bbc1c0a9c7ca466ceddd9edfa03f
. kernel: signal handler args for ARM
. kernel: sanity check return address (LSB indicates thumb mode)
. libc: properly retrieve signal mask for ARM
together fix test37 on ARM.
Change-Id: I4e00f754c50104ed85c7fdf8ec5ad54568f20a81
A few kernel and calling convention adjustments to make sigsend and
sigreturn work for arm.
. provide a arch_proc_setcontext for earm in kernel
. set LR in context of signal handler to provide a proper
return address (to __sigreturn)
. change __sigreturn to retrieve the sigcontext pointer
from the sigframe struct and pass it to _sigreturn() in r0
Change-Id: Icd135a70595382c79d11d8dd9876f6a6f1df41f8
* Updating common/lib
* Updating lib/csu
* Updating lib/libc
* Updating libexec/ld.elf_so
* Corrected test on __minix in featuretest to actually follow the
meaning of the comment.
* Cleaned up _REENTRANT-related defintions.
* Disabled -D_REENTRANT for libfetch
* Removing some unneeded __NBSD_LIBC defines and tests
Change-Id: Ic1394baef74d11b9f86b312f5ff4bbc3cbf72ce2
. restore state depends on how saving of state was done;
also remember trap style in sig context
. actually set and restore TRACEBIT with new trap styles;
have to remove it once process enters kernel though, done
in debug trap exception handler
. introduce MF_STEP that makes arch-specific code
turn on trace bit instead of setting TRACEBIT directly,
a bit more arch-friendly and avoids keeping precious
state in per-process PSW arch-dependently
. add cpufeature detection of both
. use it for both ipc and kernelcall traps, using a register
for call number
. SYSENTER/SYSCALL does not save any context, therefore userland
has to save it
. to accomodate multiple kernel entry/exit types, the entry
type is recorded in the process struct. hitherto all types
were interrupt (soft int, exception, hard int); now SYSENTER/SYSCALL
is new, with the difference that context is not fully restored
from proc struct when running the process again. this can't be
done as some information is missing.
. complication: cases in which the kernel has to fully change
process context (i.e. sigreturn). in that case the exit type
is changed from SYSENTER/SYSEXIT to soft-int (i.e. iret) and
context is fully restored from the proc struct. this does mean
the PC and SP must change, as the sysenter/sysexit userland code
will otherwise try to restore its own context. this is true in the
sigreturn case.
. override all usage by setting libc_ipc=1
. map all objects named usermapped_*.o with globally visible
pages; usermapped_glo_*.o with the VM 'global' bit on, i.e.
permanently in tlb (very scarce resource!)
. added kinfo, machine, kmessages and loadinfo for a start
. modified log, tty to make use of the shared messages struct
. some strncpy/strcpy to strlcpy conversions
. new <minix/param.h> to avoid including other minix headers
that have colliding definitions with library and commands code,
causing parse warnings
. removed some dead code / assignments
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.
There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.
No static pre-allocated memory sizes exist any more.
Changes to booting:
. The pre_init.c leaves the kernel and modules exactly as
they were left by the bootloader in physical memory
. The kernel starts running using physical addressing,
loaded at a fixed location given in its linker script by the
bootloader. All code and data in this phase are linked to
this fixed low location.
. It makes a bootstrap pagetable to map itself to a
fixed high location (also in linker script) and jumps to
the high address. All code and data then use this high addressing.
. All code/data symbols linked at the low addresses is prefixed by
an objcopy step with __k_unpaged_*, so that that code cannot
reference highly-linked symbols (which aren't valid yet) or vice
versa (symbols that aren't valid any more).
. The two addressing modes are separated in the linker script by
collecting the unpaged_*.o objects and linking them with low
addresses, and linking the rest high. Some objects are linked
twice, once low and once high.
. The bootstrap phase passes a lot of information (e.g. free memory
list, physical location of the modules, etc.) using the kinfo
struct.
. After this bootstrap the low-linked part is freed.
. The kernel maps in VM into the bootstrap page table so that VM can
begin executing. Its first job is to make page tables for all other
boot processes. So VM runs before RS, and RS gets a fully dynamic,
VM-managed address space. VM gets its privilege info from RS as usual
but that happens after RS starts running.
. Both the kernel loading VM and VM organizing boot processes happen
using the libexec logic. This removes the last reason for VM to
still know much about exec() and vm/exec.c is gone.
Further Implementation:
. All segments are based at 0 and have a 4 GB limit.
. The kernel is mapped in at the top of the virtual address
space so as not to constrain the user processes.
. Processes do not use segments from the LDT at all; there are
no segments in the LDT any more, so no LLDT is needed.
. The Minix segments T/D/S are gone and so none of the
user-space or in-kernel copy functions use them. The copy
functions use a process endpoint of NONE to realize it's
a physical address, virtual otherwise.
. The umap call only makes sense to translate a virtual address
to a physical address now.
. Segments-related calls like newmap and alloc_segments are gone.
. All segments-related translation in VM is gone (vir2map etc).
. Initialization in VM is simpler as no moving around is necessary.
. VM and all other boot processes can be linked wherever they wish
and will be mapped in at the right location by the kernel and VM
respectively.
Other changes:
. The multiboot code is less special: it does not use mb_print
for its diagnostics any more but uses printf() as normal, saving
the output into the diagnostics buffer, only printing to the
screen using the direct print functions if a panic() occurs.
. The multiboot code uses the flexible 'free memory map list'
style to receive the list of free memory if available.
. The kernel determines the memory layout of the processes to
a degree: it tells VM where the kernel starts and ends and
where the kernel wants the top of the process to be. VM then
uses this entire range, i.e. the stack is right at the top,
and mmap()ped bits of memory are placed below that downwards,
and the break grows upwards.
Other Consequences:
. Every process gets its own page table as address spaces
can't be separated any more by segments.
. As all segments are 0-based, there is no distinction between
virtual and linear addresses, nor between userspace and
kernel addresses.
. Less work is done when context switching, leading to a net
performance increase. (8% faster on my machine for 'make servers'.)
. The layout and configuration of the GDT makes sysenter and syscall
possible.
. sys_vircopy always uses D for both src and dst
. sys_physcopy uses PHYS_SEG if and only if corresponding
endpoint is NONE, so we can derive the mode (PHYS_SEG or D)
from the endpoint arg in the kernel, dropping the seg args
. fields in msg still filled in for backwards compatability,
using same NONE-logic in the library
. all invocations were S or D, so can safely be dropped
to prepare for the segmentless world
. still assign D to the SCP_SEG field in the message
to make previous kernels usable