2006-06-20 12:03:10 +02:00
|
|
|
/* The kernel call implemented in this file:
|
2006-06-23 17:07:41 +02:00
|
|
|
* m_type: SYS_SAFECOPYFROM or SYS_SAFECOPYTO or SYS_VSAFECOPY
|
2006-06-20 12:03:10 +02:00
|
|
|
*
|
|
|
|
* The parameters for this kernel call are:
|
|
|
|
* SCP_FROM_TO other endpoint
|
|
|
|
* SCP_GID grant id
|
|
|
|
* SCP_OFFSET offset within granted space
|
|
|
|
* SCP_ADDRESS address in own address space
|
|
|
|
* SCP_BYTES bytes to be copied
|
2006-06-23 13:54:03 +02:00
|
|
|
*
|
|
|
|
* For the vectored variant (do_vsafecopy):
|
|
|
|
* VSCP_VEC_ADDR address of vector
|
|
|
|
* VSCP_VEC_SIZE number of significant elements in vector
|
2006-06-20 12:03:10 +02:00
|
|
|
*/
|
|
|
|
|
No more intel/minix segments.
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.
There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.
No static pre-allocated memory sizes exist any more.
Changes to booting:
. The pre_init.c leaves the kernel and modules exactly as
they were left by the bootloader in physical memory
. The kernel starts running using physical addressing,
loaded at a fixed location given in its linker script by the
bootloader. All code and data in this phase are linked to
this fixed low location.
. It makes a bootstrap pagetable to map itself to a
fixed high location (also in linker script) and jumps to
the high address. All code and data then use this high addressing.
. All code/data symbols linked at the low addresses is prefixed by
an objcopy step with __k_unpaged_*, so that that code cannot
reference highly-linked symbols (which aren't valid yet) or vice
versa (symbols that aren't valid any more).
. The two addressing modes are separated in the linker script by
collecting the unpaged_*.o objects and linking them with low
addresses, and linking the rest high. Some objects are linked
twice, once low and once high.
. The bootstrap phase passes a lot of information (e.g. free memory
list, physical location of the modules, etc.) using the kinfo
struct.
. After this bootstrap the low-linked part is freed.
. The kernel maps in VM into the bootstrap page table so that VM can
begin executing. Its first job is to make page tables for all other
boot processes. So VM runs before RS, and RS gets a fully dynamic,
VM-managed address space. VM gets its privilege info from RS as usual
but that happens after RS starts running.
. Both the kernel loading VM and VM organizing boot processes happen
using the libexec logic. This removes the last reason for VM to
still know much about exec() and vm/exec.c is gone.
Further Implementation:
. All segments are based at 0 and have a 4 GB limit.
. The kernel is mapped in at the top of the virtual address
space so as not to constrain the user processes.
. Processes do not use segments from the LDT at all; there are
no segments in the LDT any more, so no LLDT is needed.
. The Minix segments T/D/S are gone and so none of the
user-space or in-kernel copy functions use them. The copy
functions use a process endpoint of NONE to realize it's
a physical address, virtual otherwise.
. The umap call only makes sense to translate a virtual address
to a physical address now.
. Segments-related calls like newmap and alloc_segments are gone.
. All segments-related translation in VM is gone (vir2map etc).
. Initialization in VM is simpler as no moving around is necessary.
. VM and all other boot processes can be linked wherever they wish
and will be mapped in at the right location by the kernel and VM
respectively.
Other changes:
. The multiboot code is less special: it does not use mb_print
for its diagnostics any more but uses printf() as normal, saving
the output into the diagnostics buffer, only printing to the
screen using the direct print functions if a panic() occurs.
. The multiboot code uses the flexible 'free memory map list'
style to receive the list of free memory if available.
. The kernel determines the memory layout of the processes to
a degree: it tells VM where the kernel starts and ends and
where the kernel wants the top of the process to be. VM then
uses this entire range, i.e. the stack is right at the top,
and mmap()ped bits of memory are placed below that downwards,
and the break grows upwards.
Other Consequences:
. Every process gets its own page table as address spaces
can't be separated any more by segments.
. As all segments are 0-based, there is no distinction between
virtual and linear addresses, nor between userspace and
kernel addresses.
. Less work is done when context switching, leading to a net
performance increase. (8% faster on my machine for 'make servers'.)
. The layout and configuration of the GDT makes sysenter and syscall
possible.
2012-05-07 16:03:35 +02:00
|
|
|
#include <assert.h>
|
2006-06-20 12:03:10 +02:00
|
|
|
|
2010-04-02 00:22:33 +02:00
|
|
|
#include "kernel/system.h"
|
2012-07-16 13:17:11 +02:00
|
|
|
#include "kernel.h"
|
2008-11-19 13:26:10 +01:00
|
|
|
|
2009-11-02 23:30:37 +01:00
|
|
|
#define MAX_INDIRECT_DEPTH 5 /* up to how many indirect grants to follow? */
|
|
|
|
|
2006-06-20 12:03:10 +02:00
|
|
|
#define MEM_TOP 0xFFFFFFFFUL
|
|
|
|
|
2012-03-25 20:25:53 +02:00
|
|
|
static int safecopy(struct proc *, endpoint_t, endpoint_t,
|
2012-06-16 03:46:15 +02:00
|
|
|
cp_grant_id_t, size_t, vir_bytes, vir_bytes, int);
|
2006-06-23 13:54:03 +02:00
|
|
|
|
2008-11-19 13:26:10 +01:00
|
|
|
#define HASGRANTTABLE(gr) \
|
2010-07-09 20:29:04 +02:00
|
|
|
(priv(gr) && priv(gr)->s_grant_table)
|
2008-11-19 13:26:10 +01:00
|
|
|
|
2006-06-20 12:03:10 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* verify_grant *
|
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
int verify_grant(granter, grantee, grant, bytes, access,
|
2006-06-20 12:03:10 +02:00
|
|
|
offset_in, offset_result, e_granter)
|
|
|
|
endpoint_t granter, grantee; /* copyee, copyer */
|
|
|
|
cp_grant_id_t grant; /* grant id */
|
|
|
|
vir_bytes bytes; /* copy size */
|
|
|
|
int access; /* direction (read/write) */
|
|
|
|
vir_bytes offset_in; /* copy offset within grant */
|
|
|
|
vir_bytes *offset_result; /* copy offset within virtual address space */
|
|
|
|
endpoint_t *e_granter; /* new granter (magic grants) */
|
|
|
|
{
|
|
|
|
static cp_grant_t g;
|
|
|
|
static int proc_nr;
|
2010-01-26 13:26:06 +01:00
|
|
|
static const struct proc *granter_proc;
|
2010-04-15 20:49:36 +02:00
|
|
|
int depth = 0;
|
2009-11-02 23:30:37 +01:00
|
|
|
|
|
|
|
do {
|
|
|
|
/* Get granter process slot (if valid), and check range of
|
|
|
|
* grant id.
|
|
|
|
*/
|
2010-03-03 00:12:13 +01:00
|
|
|
if(!isokendpt(granter, &proc_nr) ) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2010-03-03 00:12:13 +01:00
|
|
|
"grant verify failed: invalid granter %d\n", (int) granter);
|
|
|
|
return(EINVAL);
|
|
|
|
}
|
|
|
|
if(!GRANT_VALID(grant)) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2010-03-03 00:12:13 +01:00
|
|
|
"grant verify failed: invalid grant %d\n", (int) grant);
|
2009-11-02 23:30:37 +01:00
|
|
|
return(EINVAL);
|
|
|
|
}
|
|
|
|
granter_proc = proc_addr(proc_nr);
|
2006-06-20 12:03:10 +02:00
|
|
|
|
2009-11-02 23:30:37 +01:00
|
|
|
/* If there is no priv. structure, or no grant table in the
|
|
|
|
* priv. structure, or the grant table in the priv. structure
|
|
|
|
* is too small for the grant, return EPERM.
|
|
|
|
*/
|
2010-07-09 20:29:04 +02:00
|
|
|
if(!HASGRANTTABLE(granter_proc)) {
|
|
|
|
printf(
|
|
|
|
"grant verify failed: granter %d has no grant table\n",
|
|
|
|
granter);
|
|
|
|
return(EPERM);
|
|
|
|
}
|
2006-06-20 12:03:10 +02:00
|
|
|
|
2009-11-02 23:30:37 +01:00
|
|
|
if(priv(granter_proc)->s_grant_entries <= grant) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2009-11-02 23:30:37 +01:00
|
|
|
"verify_grant: grant verify failed in ep %d "
|
|
|
|
"proc %d: grant %d out of range "
|
|
|
|
"for table size %d\n",
|
|
|
|
granter, proc_nr, grant,
|
|
|
|
priv(granter_proc)->s_grant_entries);
|
|
|
|
return(EPERM);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Copy the grant entry corresponding to this id to see what it
|
|
|
|
* looks like. If it fails, hide the fact that granter has
|
|
|
|
* (presumably) set an invalid grant table entry by returning
|
|
|
|
* EPERM, just like with an invalid grant id.
|
|
|
|
*/
|
2010-04-15 20:49:36 +02:00
|
|
|
if(data_copy(granter,
|
2009-11-02 23:30:37 +01:00
|
|
|
priv(granter_proc)->s_grant_table + sizeof(g)*grant,
|
2010-04-15 20:49:36 +02:00
|
|
|
KERNEL, (vir_bytes) &g, sizeof(g)) != OK) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2009-11-02 23:30:37 +01:00
|
|
|
"verify_grant: grant verify: data_copy failed\n");
|
|
|
|
return EPERM;
|
|
|
|
}
|
2006-06-20 12:03:10 +02:00
|
|
|
|
2009-11-02 23:30:37 +01:00
|
|
|
/* Check validity. */
|
|
|
|
if((g.cp_flags & (CPF_USED | CPF_VALID)) !=
|
|
|
|
(CPF_USED | CPF_VALID)) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2009-11-02 23:30:37 +01:00
|
|
|
"verify_grant: grant failed: invalid (%d flags 0x%lx)\n",
|
|
|
|
grant, g.cp_flags);
|
|
|
|
return EPERM;
|
|
|
|
}
|
2006-06-20 12:03:10 +02:00
|
|
|
|
2009-11-02 23:30:37 +01:00
|
|
|
/* The given grant may be an indirect grant, that is, a grant
|
|
|
|
* that provides permission to use a grant given to the
|
|
|
|
* granter (i.e., for which it is the grantee). This can lead
|
|
|
|
* to a chain of indirect grants which must be followed back.
|
|
|
|
*/
|
|
|
|
if((g.cp_flags & CPF_INDIRECT)) {
|
|
|
|
/* Stop after a few iterations. There may be a loop. */
|
|
|
|
if (depth == MAX_INDIRECT_DEPTH) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2009-11-02 23:30:37 +01:00
|
|
|
"verify grant: indirect grant verify "
|
|
|
|
"failed: exceeded maximum depth\n");
|
|
|
|
return ELOOP;
|
|
|
|
}
|
|
|
|
depth++;
|
|
|
|
|
|
|
|
/* Verify actual grantee. */
|
|
|
|
if(g.cp_u.cp_indirect.cp_who_to != grantee &&
|
Initialization protocol for system services.
SYSLIB CHANGES:
- SEF framework now supports a new SEF Init request type from RS. 3 different
callbacks are available (init_fresh, init_lu, init_restart) to specify
initialization code when a service starts fresh, starts after a live update,
or restarts.
SYSTEM SERVICE CHANGES:
- Initialization code for system services is now enclosed in a callback SEF will
automatically call at init time. The return code of the callback will
tell RS whether the initialization completed successfully.
- Each init callback can access information passed by RS to initialize. As of
now, each system service has access to the public entries of RS's system process
table to gather all the information required to initialize. This design
eliminates many existing or potential races at boot time and provides a uniform
initialization interface to system services. The same interface will be reused
for the upcoming publish/subscribe model to handle dynamic
registration / deregistration of system services.
VM CHANGES:
- Uniform privilege management for all system services. Every service uses the
same call mask format. For boot services, VM copies the call mask from init
data. For dynamic services, VM still receives the call mask via rs_set_priv
call that will be soon replaced by the upcoming publish/subscribe model.
RS CHANGES:
- The system process table has been reorganized and split into private entries
and public entries. Only the latter ones are exposed to system services.
- VM call masks are now entirely configured in rs/table.c
- RS has now its own slot in the system process table. Only kernel tasks and
user processes not included in the boot image are now left out from the system
process table.
- RS implements the initialization protocol for system services.
- For services in the boot image, RS blocks till initialization is complete and
panics when failure is reported back. Services are initialized in their order of
appearance in the boot image priv table and RS blocks to implements synchronous
initialization for every system service having the flag SF_SYNCH_BOOT set.
- For services started dynamically, the initialization protocol is implemented
as though it were the first ping for the service. In this case, if the
system service fails to report back (or reports failure), RS brings the service
down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
|
|
|
grantee != ANY &&
|
|
|
|
g.cp_u.cp_indirect.cp_who_to != ANY) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2009-11-02 23:30:37 +01:00
|
|
|
"verify_grant: indirect grant verify "
|
|
|
|
"failed: bad grantee\n");
|
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Start over with new granter, grant, and grantee. */
|
|
|
|
grantee = granter;
|
|
|
|
granter = g.cp_u.cp_indirect.cp_who_from;
|
|
|
|
grant = g.cp_u.cp_indirect.cp_grant;
|
|
|
|
}
|
|
|
|
} while(g.cp_flags & CPF_INDIRECT);
|
2006-06-20 12:03:10 +02:00
|
|
|
|
|
|
|
/* Check access of grant. */
|
|
|
|
if(((g.cp_flags & access) != access)) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-11-19 13:26:10 +01:00
|
|
|
"verify_grant: grant verify failed: access invalid; want 0x%x, have 0x%x\n",
|
2006-06-20 12:03:10 +02:00
|
|
|
access, g.cp_flags);
|
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
if((g.cp_flags & CPF_DIRECT)) {
|
|
|
|
/* Don't fiddle around with grants that wrap, arithmetic
|
|
|
|
* below may be confused.
|
|
|
|
*/
|
2010-06-04 20:05:38 +02:00
|
|
|
if(MEM_TOP - g.cp_u.cp_direct.cp_len + 1 <
|
|
|
|
g.cp_u.cp_direct.cp_start) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-02-22 12:00:06 +01:00
|
|
|
"verify_grant: direct grant verify failed: len too long\n");
|
2006-06-20 12:03:10 +02:00
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Verify actual grantee. */
|
Initialization protocol for system services.
SYSLIB CHANGES:
- SEF framework now supports a new SEF Init request type from RS. 3 different
callbacks are available (init_fresh, init_lu, init_restart) to specify
initialization code when a service starts fresh, starts after a live update,
or restarts.
SYSTEM SERVICE CHANGES:
- Initialization code for system services is now enclosed in a callback SEF will
automatically call at init time. The return code of the callback will
tell RS whether the initialization completed successfully.
- Each init callback can access information passed by RS to initialize. As of
now, each system service has access to the public entries of RS's system process
table to gather all the information required to initialize. This design
eliminates many existing or potential races at boot time and provides a uniform
initialization interface to system services. The same interface will be reused
for the upcoming publish/subscribe model to handle dynamic
registration / deregistration of system services.
VM CHANGES:
- Uniform privilege management for all system services. Every service uses the
same call mask format. For boot services, VM copies the call mask from init
data. For dynamic services, VM still receives the call mask via rs_set_priv
call that will be soon replaced by the upcoming publish/subscribe model.
RS CHANGES:
- The system process table has been reorganized and split into private entries
and public entries. Only the latter ones are exposed to system services.
- VM call masks are now entirely configured in rs/table.c
- RS has now its own slot in the system process table. Only kernel tasks and
user processes not included in the boot image are now left out from the system
process table.
- RS implements the initialization protocol for system services.
- For services in the boot image, RS blocks till initialization is complete and
panics when failure is reported back. Services are initialized in their order of
appearance in the boot image priv table and RS blocks to implements synchronous
initialization for every system service having the flag SF_SYNCH_BOOT set.
- For services started dynamically, the initialization protocol is implemented
as though it were the first ping for the service. In this case, if the
system service fails to report back (or reports failure), RS brings the service
down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
|
|
|
if(g.cp_u.cp_direct.cp_who_to != grantee && grantee != ANY
|
|
|
|
&& g.cp_u.cp_direct.cp_who_to != ANY) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-02-22 12:00:06 +01:00
|
|
|
"verify_grant: direct grant verify failed: bad grantee\n");
|
2006-06-20 12:03:10 +02:00
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Verify actual copy range. */
|
|
|
|
if((offset_in+bytes < offset_in) ||
|
|
|
|
offset_in+bytes > g.cp_u.cp_direct.cp_len) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-02-22 12:00:06 +01:00
|
|
|
"verify_grant: direct grant verify failed: bad size or range. "
|
|
|
|
"granted %d bytes @ 0x%lx; wanted %d bytes @ 0x%lx\n",
|
|
|
|
g.cp_u.cp_direct.cp_len,
|
|
|
|
g.cp_u.cp_direct.cp_start,
|
2006-06-20 12:03:10 +02:00
|
|
|
bytes, offset_in);
|
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Verify successful - tell caller what address it is. */
|
|
|
|
*offset_result = g.cp_u.cp_direct.cp_start + offset_in;
|
|
|
|
*e_granter = granter;
|
|
|
|
} else if(g.cp_flags & CPF_MAGIC) {
|
|
|
|
/* Currently, it is hardcoded that only FS may do
|
|
|
|
* magic grants.
|
|
|
|
*/
|
2010-06-08 15:58:01 +02:00
|
|
|
if(granter != VFS_PROC_NR) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-02-22 12:00:06 +01:00
|
|
|
"verify_grant: magic grant verify failed: granter (%d) "
|
2010-06-08 15:58:01 +02:00
|
|
|
"is not FS (%d)\n", granter, VFS_PROC_NR);
|
2006-06-20 12:03:10 +02:00
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Verify actual grantee. */
|
Initialization protocol for system services.
SYSLIB CHANGES:
- SEF framework now supports a new SEF Init request type from RS. 3 different
callbacks are available (init_fresh, init_lu, init_restart) to specify
initialization code when a service starts fresh, starts after a live update,
or restarts.
SYSTEM SERVICE CHANGES:
- Initialization code for system services is now enclosed in a callback SEF will
automatically call at init time. The return code of the callback will
tell RS whether the initialization completed successfully.
- Each init callback can access information passed by RS to initialize. As of
now, each system service has access to the public entries of RS's system process
table to gather all the information required to initialize. This design
eliminates many existing or potential races at boot time and provides a uniform
initialization interface to system services. The same interface will be reused
for the upcoming publish/subscribe model to handle dynamic
registration / deregistration of system services.
VM CHANGES:
- Uniform privilege management for all system services. Every service uses the
same call mask format. For boot services, VM copies the call mask from init
data. For dynamic services, VM still receives the call mask via rs_set_priv
call that will be soon replaced by the upcoming publish/subscribe model.
RS CHANGES:
- The system process table has been reorganized and split into private entries
and public entries. Only the latter ones are exposed to system services.
- VM call masks are now entirely configured in rs/table.c
- RS has now its own slot in the system process table. Only kernel tasks and
user processes not included in the boot image are now left out from the system
process table.
- RS implements the initialization protocol for system services.
- For services in the boot image, RS blocks till initialization is complete and
panics when failure is reported back. Services are initialized in their order of
appearance in the boot image priv table and RS blocks to implements synchronous
initialization for every system service having the flag SF_SYNCH_BOOT set.
- For services started dynamically, the initialization protocol is implemented
as though it were the first ping for the service. In this case, if the
system service fails to report back (or reports failure), RS brings the service
down rather than trying to restart it.
2010-01-08 02:20:42 +01:00
|
|
|
if(g.cp_u.cp_magic.cp_who_to != grantee && grantee != ANY
|
|
|
|
&& g.cp_u.cp_direct.cp_who_to != ANY) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-02-22 12:00:06 +01:00
|
|
|
"verify_grant: magic grant verify failed: bad grantee\n");
|
2006-06-20 12:03:10 +02:00
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Verify actual copy range. */
|
|
|
|
if((offset_in+bytes < offset_in) ||
|
|
|
|
offset_in+bytes > g.cp_u.cp_magic.cp_len) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-02-22 12:00:06 +01:00
|
|
|
"verify_grant: magic grant verify failed: bad size or range. "
|
|
|
|
"granted %d bytes @ 0x%lx; wanted %d bytes @ 0x%lx\n",
|
|
|
|
g.cp_u.cp_magic.cp_len,
|
|
|
|
g.cp_u.cp_magic.cp_start,
|
2006-06-20 12:03:10 +02:00
|
|
|
bytes, offset_in);
|
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Verify successful - tell caller what address it is. */
|
|
|
|
*offset_result = g.cp_u.cp_magic.cp_start + offset_in;
|
|
|
|
*e_granter = g.cp_u.cp_magic.cp_who_from;
|
|
|
|
} else {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-02-22 12:00:06 +01:00
|
|
|
"verify_grant: grant verify failed: unknown grant type\n");
|
2006-06-20 12:03:10 +02:00
|
|
|
return EPERM;
|
|
|
|
}
|
|
|
|
|
|
|
|
return OK;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
2006-06-23 13:54:03 +02:00
|
|
|
* safecopy *
|
2006-06-20 12:03:10 +02:00
|
|
|
*===========================================================================*/
|
2012-06-16 03:46:15 +02:00
|
|
|
static int safecopy(caller, granter, grantee, grantid, bytes,
|
2006-06-23 13:54:03 +02:00
|
|
|
g_offset, addr, access)
|
2010-02-03 10:04:48 +01:00
|
|
|
struct proc * caller;
|
2006-06-23 13:54:03 +02:00
|
|
|
endpoint_t granter, grantee;
|
|
|
|
cp_grant_id_t grantid;
|
|
|
|
size_t bytes;
|
|
|
|
vir_bytes g_offset, addr;
|
|
|
|
int access; /* CPF_READ for a copy from granter to grantee, CPF_WRITE
|
|
|
|
* for a copy from grantee to granter.
|
|
|
|
*/
|
2006-06-20 12:03:10 +02:00
|
|
|
{
|
|
|
|
static struct vir_addr v_src, v_dst;
|
2006-06-23 13:54:03 +02:00
|
|
|
static vir_bytes v_offset;
|
|
|
|
endpoint_t new_granter, *src, *dst;
|
2008-11-19 13:26:10 +01:00
|
|
|
struct proc *granter_p;
|
2010-01-14 16:24:16 +01:00
|
|
|
int r;
|
2010-06-28 23:53:37 +02:00
|
|
|
#if PERF_USE_COW_SAFECOPY
|
2010-02-03 10:04:48 +01:00
|
|
|
vir_bytes size;
|
|
|
|
#endif
|
2008-11-19 13:26:10 +01:00
|
|
|
|
No more intel/minix segments.
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.
There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.
No static pre-allocated memory sizes exist any more.
Changes to booting:
. The pre_init.c leaves the kernel and modules exactly as
they were left by the bootloader in physical memory
. The kernel starts running using physical addressing,
loaded at a fixed location given in its linker script by the
bootloader. All code and data in this phase are linked to
this fixed low location.
. It makes a bootstrap pagetable to map itself to a
fixed high location (also in linker script) and jumps to
the high address. All code and data then use this high addressing.
. All code/data symbols linked at the low addresses is prefixed by
an objcopy step with __k_unpaged_*, so that that code cannot
reference highly-linked symbols (which aren't valid yet) or vice
versa (symbols that aren't valid any more).
. The two addressing modes are separated in the linker script by
collecting the unpaged_*.o objects and linking them with low
addresses, and linking the rest high. Some objects are linked
twice, once low and once high.
. The bootstrap phase passes a lot of information (e.g. free memory
list, physical location of the modules, etc.) using the kinfo
struct.
. After this bootstrap the low-linked part is freed.
. The kernel maps in VM into the bootstrap page table so that VM can
begin executing. Its first job is to make page tables for all other
boot processes. So VM runs before RS, and RS gets a fully dynamic,
VM-managed address space. VM gets its privilege info from RS as usual
but that happens after RS starts running.
. Both the kernel loading VM and VM organizing boot processes happen
using the libexec logic. This removes the last reason for VM to
still know much about exec() and vm/exec.c is gone.
Further Implementation:
. All segments are based at 0 and have a 4 GB limit.
. The kernel is mapped in at the top of the virtual address
space so as not to constrain the user processes.
. Processes do not use segments from the LDT at all; there are
no segments in the LDT any more, so no LLDT is needed.
. The Minix segments T/D/S are gone and so none of the
user-space or in-kernel copy functions use them. The copy
functions use a process endpoint of NONE to realize it's
a physical address, virtual otherwise.
. The umap call only makes sense to translate a virtual address
to a physical address now.
. Segments-related calls like newmap and alloc_segments are gone.
. All segments-related translation in VM is gone (vir2map etc).
. Initialization in VM is simpler as no moving around is necessary.
. VM and all other boot processes can be linked wherever they wish
and will be mapped in at the right location by the kernel and VM
respectively.
Other changes:
. The multiboot code is less special: it does not use mb_print
for its diagnostics any more but uses printf() as normal, saving
the output into the diagnostics buffer, only printing to the
screen using the direct print functions if a panic() occurs.
. The multiboot code uses the flexible 'free memory map list'
style to receive the list of free memory if available.
. The kernel determines the memory layout of the processes to
a degree: it tells VM where the kernel starts and ends and
where the kernel wants the top of the process to be. VM then
uses this entire range, i.e. the stack is right at the top,
and mmap()ped bits of memory are placed below that downwards,
and the break grows upwards.
Other Consequences:
. Every process gets its own page table as address spaces
can't be separated any more by segments.
. As all segments are 0-based, there is no distinction between
virtual and linear addresses, nor between userspace and
kernel addresses.
. Less work is done when context switching, leading to a net
performance increase. (8% faster on my machine for 'make servers'.)
. The layout and configuration of the GDT makes sysenter and syscall
possible.
2012-05-07 16:03:35 +02:00
|
|
|
if(granter == NONE || grantee == NONE) {
|
|
|
|
printf("safecopy: nonsense processes\n");
|
|
|
|
return EFAULT;
|
|
|
|
}
|
|
|
|
|
2008-11-19 13:26:10 +01:00
|
|
|
/* See if there is a reasonable grant table. */
|
|
|
|
if(!(granter_p = endpoint_lookup(granter))) return EINVAL;
|
2010-07-09 20:29:04 +02:00
|
|
|
if(!HASGRANTTABLE(granter_p)) {
|
|
|
|
printf(
|
|
|
|
"safecopy failed: granter %d has no grant table\n", granter);
|
|
|
|
return(EPERM);
|
|
|
|
}
|
2006-06-20 12:03:10 +02:00
|
|
|
|
2006-06-23 13:54:03 +02:00
|
|
|
/* Decide who is src and who is dst. */
|
|
|
|
if(access & CPF_READ) {
|
2006-06-20 12:03:10 +02:00
|
|
|
src = &granter;
|
|
|
|
dst = &grantee;
|
2006-06-23 13:54:03 +02:00
|
|
|
} else {
|
2006-06-20 12:03:10 +02:00
|
|
|
src = &grantee;
|
|
|
|
dst = &granter;
|
2006-06-23 13:54:03 +02:00
|
|
|
}
|
2006-06-20 12:03:10 +02:00
|
|
|
|
|
|
|
/* Verify permission exists. */
|
2006-06-23 13:54:03 +02:00
|
|
|
if((r=verify_grant(granter, grantee, grantid, bytes, access,
|
|
|
|
g_offset, &v_offset, &new_granter)) != OK) {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf(
|
2008-02-22 12:00:06 +01:00
|
|
|
"grant %d verify to copy %d->%d by %d failed: err %d\n",
|
|
|
|
grantid, *src, *dst, grantee, r);
|
2006-06-20 12:03:10 +02:00
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* verify_grant() can redirect the grantee to someone else,
|
|
|
|
* meaning the source or destination changes.
|
|
|
|
*/
|
|
|
|
granter = new_granter;
|
|
|
|
|
|
|
|
/* Now it's a regular copy. */
|
|
|
|
v_src.proc_nr_e = *src;
|
|
|
|
v_dst.proc_nr_e = *dst;
|
|
|
|
|
|
|
|
/* Now the offset in virtual addressing is known in 'offset'.
|
2006-06-23 13:54:03 +02:00
|
|
|
* Depending on the access, this is the source or destination
|
2006-06-20 12:03:10 +02:00
|
|
|
* address.
|
|
|
|
*/
|
2006-06-23 13:54:03 +02:00
|
|
|
if(access & CPF_READ) {
|
|
|
|
v_src.offset = v_offset;
|
|
|
|
v_dst.offset = (vir_bytes) addr;
|
2006-06-20 12:03:10 +02:00
|
|
|
} else {
|
2006-06-23 13:54:03 +02:00
|
|
|
v_src.offset = (vir_bytes) addr;
|
|
|
|
v_dst.offset = v_offset;
|
2006-06-20 12:03:10 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Do the regular copy. */
|
2010-06-28 23:53:37 +02:00
|
|
|
#if PERF_USE_COW_SAFECOPY
|
2010-01-14 16:24:16 +01:00
|
|
|
if(v_offset % CLICK_SIZE != addr % CLICK_SIZE || bytes < CLICK_SIZE) {
|
|
|
|
/* Give up on COW immediately when offsets are not aligned
|
|
|
|
* or we are copying less than a page.
|
|
|
|
*/
|
2010-02-03 10:04:48 +01:00
|
|
|
return virtual_copy_vmcheck(caller, &v_src, &v_dst, bytes);
|
2010-01-14 16:24:16 +01:00
|
|
|
}
|
2006-06-23 13:54:03 +02:00
|
|
|
|
2010-01-14 16:24:16 +01:00
|
|
|
if((size = v_offset % CLICK_SIZE) != 0) {
|
|
|
|
/* Normal copy for everything before the first page boundary. */
|
|
|
|
size = CLICK_SIZE - size;
|
2010-02-03 10:04:48 +01:00
|
|
|
r = virtual_copy_vmcheck(caller, &v_src, &v_dst, size);
|
2010-01-14 16:24:16 +01:00
|
|
|
if(r != OK)
|
|
|
|
return r;
|
|
|
|
v_src.offset += size;
|
|
|
|
v_dst.offset += size;
|
|
|
|
bytes -= size;
|
|
|
|
}
|
|
|
|
if((size = bytes / CLICK_SIZE) != 0) {
|
|
|
|
/* Use COW optimization when copying entire pages. */
|
|
|
|
size *= CLICK_SIZE;
|
|
|
|
r = map_invoke_vm(VMPTYPE_COWMAP,
|
|
|
|
v_dst.proc_nr_e, v_dst.segment, v_dst.offset,
|
|
|
|
v_src.proc_nr_e, v_src.segment, v_src.offset,
|
|
|
|
size, 0);
|
|
|
|
if(r != OK)
|
|
|
|
return r;
|
|
|
|
v_src.offset += size;
|
|
|
|
v_dst.offset += size;
|
|
|
|
bytes -= size;
|
|
|
|
}
|
|
|
|
if(bytes != 0) {
|
|
|
|
/* Normal copy for everything after the last page boundary. */
|
2010-02-03 10:04:48 +01:00
|
|
|
r = virtual_copy_vmcheck(caller, &v_src, &v_dst, bytes);
|
2010-01-14 16:24:16 +01:00
|
|
|
if(r != OK)
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
|
|
|
|
return OK;
|
|
|
|
#else
|
2010-02-03 10:04:48 +01:00
|
|
|
return virtual_copy_vmcheck(caller, &v_src, &v_dst, bytes);
|
2010-01-14 16:24:16 +01:00
|
|
|
#endif
|
2006-06-23 13:54:03 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
2010-06-01 10:51:37 +02:00
|
|
|
* do_safecopy_to *
|
2006-06-23 13:54:03 +02:00
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
int do_safecopy_to(struct proc * caller, message * m_ptr)
|
2006-06-23 13:54:03 +02:00
|
|
|
{
|
2010-06-01 10:51:37 +02:00
|
|
|
return safecopy(caller, m_ptr->SCP_FROM_TO, caller->p_endpoint,
|
2012-06-16 03:46:15 +02:00
|
|
|
(cp_grant_id_t) m_ptr->SCP_GID,
|
2010-06-01 10:51:37 +02:00
|
|
|
m_ptr->SCP_BYTES, m_ptr->SCP_OFFSET,
|
|
|
|
(vir_bytes) m_ptr->SCP_ADDRESS, CPF_WRITE);
|
|
|
|
}
|
2006-06-23 13:54:03 +02:00
|
|
|
|
2010-06-01 10:51:37 +02:00
|
|
|
/*===========================================================================*
|
|
|
|
* do_safecopy_from *
|
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
int do_safecopy_from(struct proc * caller, message * m_ptr)
|
2010-06-01 10:51:37 +02:00
|
|
|
{
|
2010-02-03 10:04:48 +01:00
|
|
|
return safecopy(caller, m_ptr->SCP_FROM_TO, caller->p_endpoint,
|
2012-06-16 03:46:15 +02:00
|
|
|
(cp_grant_id_t) m_ptr->SCP_GID,
|
2010-03-03 00:12:13 +01:00
|
|
|
m_ptr->SCP_BYTES, m_ptr->SCP_OFFSET,
|
2010-06-01 10:51:37 +02:00
|
|
|
(vir_bytes) m_ptr->SCP_ADDRESS, CPF_READ);
|
2006-06-23 13:54:03 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
/*===========================================================================*
|
|
|
|
* do_vsafecopy *
|
|
|
|
*===========================================================================*/
|
2012-03-25 20:25:53 +02:00
|
|
|
int do_vsafecopy(struct proc * caller, message * m_ptr)
|
2006-06-23 13:54:03 +02:00
|
|
|
{
|
|
|
|
static struct vscp_vec vec[SCPVEC_NR];
|
|
|
|
static struct vir_addr src, dst;
|
|
|
|
int r, i, els;
|
2008-11-19 13:26:10 +01:00
|
|
|
size_t bytes;
|
2006-06-23 13:54:03 +02:00
|
|
|
|
|
|
|
/* Set vector copy parameters. */
|
2010-02-03 10:04:48 +01:00
|
|
|
src.proc_nr_e = caller->p_endpoint;
|
No more intel/minix segments.
This commit removes all traces of Minix segments (the text/data/stack
memory map abstraction in the kernel) and significance of Intel segments
(hardware segments like CS, DS that add offsets to all addressing before
page table translation). This ultimately simplifies the memory layout
and addressing and makes the same layout possible on non-Intel
architectures.
There are only two types of addresses in the world now: virtual
and physical; even the kernel and processes have the same virtual
address space. Kernel and user processes can be distinguished at a
glance as processes won't use 0xF0000000 and above.
No static pre-allocated memory sizes exist any more.
Changes to booting:
. The pre_init.c leaves the kernel and modules exactly as
they were left by the bootloader in physical memory
. The kernel starts running using physical addressing,
loaded at a fixed location given in its linker script by the
bootloader. All code and data in this phase are linked to
this fixed low location.
. It makes a bootstrap pagetable to map itself to a
fixed high location (also in linker script) and jumps to
the high address. All code and data then use this high addressing.
. All code/data symbols linked at the low addresses is prefixed by
an objcopy step with __k_unpaged_*, so that that code cannot
reference highly-linked symbols (which aren't valid yet) or vice
versa (symbols that aren't valid any more).
. The two addressing modes are separated in the linker script by
collecting the unpaged_*.o objects and linking them with low
addresses, and linking the rest high. Some objects are linked
twice, once low and once high.
. The bootstrap phase passes a lot of information (e.g. free memory
list, physical location of the modules, etc.) using the kinfo
struct.
. After this bootstrap the low-linked part is freed.
. The kernel maps in VM into the bootstrap page table so that VM can
begin executing. Its first job is to make page tables for all other
boot processes. So VM runs before RS, and RS gets a fully dynamic,
VM-managed address space. VM gets its privilege info from RS as usual
but that happens after RS starts running.
. Both the kernel loading VM and VM organizing boot processes happen
using the libexec logic. This removes the last reason for VM to
still know much about exec() and vm/exec.c is gone.
Further Implementation:
. All segments are based at 0 and have a 4 GB limit.
. The kernel is mapped in at the top of the virtual address
space so as not to constrain the user processes.
. Processes do not use segments from the LDT at all; there are
no segments in the LDT any more, so no LLDT is needed.
. The Minix segments T/D/S are gone and so none of the
user-space or in-kernel copy functions use them. The copy
functions use a process endpoint of NONE to realize it's
a physical address, virtual otherwise.
. The umap call only makes sense to translate a virtual address
to a physical address now.
. Segments-related calls like newmap and alloc_segments are gone.
. All segments-related translation in VM is gone (vir2map etc).
. Initialization in VM is simpler as no moving around is necessary.
. VM and all other boot processes can be linked wherever they wish
and will be mapped in at the right location by the kernel and VM
respectively.
Other changes:
. The multiboot code is less special: it does not use mb_print
for its diagnostics any more but uses printf() as normal, saving
the output into the diagnostics buffer, only printing to the
screen using the direct print functions if a panic() occurs.
. The multiboot code uses the flexible 'free memory map list'
style to receive the list of free memory if available.
. The kernel determines the memory layout of the processes to
a degree: it tells VM where the kernel starts and ends and
where the kernel wants the top of the process to be. VM then
uses this entire range, i.e. the stack is right at the top,
and mmap()ped bits of memory are placed below that downwards,
and the break grows upwards.
Other Consequences:
. Every process gets its own page table as address spaces
can't be separated any more by segments.
. As all segments are 0-based, there is no distinction between
virtual and linear addresses, nor between userspace and
kernel addresses.
. Less work is done when context switching, leading to a net
performance increase. (8% faster on my machine for 'make servers'.)
. The layout and configuration of the GDT makes sysenter and syscall
possible.
2012-05-07 16:03:35 +02:00
|
|
|
assert(src.proc_nr_e != NONE);
|
2006-06-23 13:54:03 +02:00
|
|
|
src.offset = (vir_bytes) m_ptr->VSCP_VEC_ADDR;
|
2010-02-09 16:20:09 +01:00
|
|
|
dst.proc_nr_e = KERNEL;
|
2006-06-23 13:54:03 +02:00
|
|
|
dst.offset = (vir_bytes) vec;
|
|
|
|
|
|
|
|
/* No. of vector elements. */
|
|
|
|
els = m_ptr->VSCP_VEC_SIZE;
|
2008-11-19 13:26:10 +01:00
|
|
|
bytes = els * sizeof(struct vscp_vec);
|
2006-06-23 13:54:03 +02:00
|
|
|
|
|
|
|
/* Obtain vector of copies. */
|
2010-02-03 10:04:48 +01:00
|
|
|
if((r=virtual_copy_vmcheck(caller, &src, &dst, bytes)) != OK)
|
2006-06-23 13:54:03 +02:00
|
|
|
return r;
|
|
|
|
|
|
|
|
/* Perform safecopies. */
|
|
|
|
for(i = 0; i < els; i++) {
|
|
|
|
int access;
|
|
|
|
endpoint_t granter;
|
|
|
|
if(vec[i].v_from == SELF) {
|
|
|
|
access = CPF_WRITE;
|
|
|
|
granter = vec[i].v_to;
|
|
|
|
} else if(vec[i].v_to == SELF) {
|
|
|
|
access = CPF_READ;
|
|
|
|
granter = vec[i].v_from;
|
|
|
|
} else {
|
2010-03-03 16:45:01 +01:00
|
|
|
printf("vsafecopy: %d: element %d/%d: no SELF found\n",
|
2010-02-03 10:04:48 +01:00
|
|
|
caller->p_endpoint, i, els);
|
2006-06-23 13:54:03 +02:00
|
|
|
return EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Do safecopy for this element. */
|
2010-02-03 10:04:48 +01:00
|
|
|
if((r=safecopy(caller, granter, caller->p_endpoint,
|
2012-06-16 03:46:15 +02:00
|
|
|
vec[i].v_gid,
|
2006-06-23 13:54:03 +02:00
|
|
|
vec[i].v_bytes, vec[i].v_offset,
|
|
|
|
vec[i].v_addr, access)) != OK) {
|
|
|
|
return r;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return OK;
|
2006-06-20 12:03:10 +02:00
|
|
|
}
|
|
|
|
|