diff --git a/web/index.html b/web/index.html deleted file mode 100644 index fe6da4a..0000000 --- a/web/index.html +++ /dev/null @@ -1,164 +0,0 @@ - - -Xv6, a simple Unix-like teaching operating system - - - - -

Xv6, a simple Unix-like teaching operating system

- -

Introduction

- -Xv6 is a teaching operating system developed in the summer of 2006 for -MIT's operating systems -course, 6.828: operating -systems Engineering. We hope that xv6 will be useful in other -courses too. This page collects resources to aid the use of xv6 in -other courses, including a commentary on the source code itself. - -

History and Background

- -

For many years, MIT had no operating systems course. In the fall of 2002, -one was created to teach operating systems engineering. In the course lectures, -the class worked through Sixth Edition Unix (aka V6) using -John Lions's famous commentary. In the lab assignments, students wrote most of -an exokernel operating system, eventually named Jos, for the Intel x86. -Exposing students to multiple systems–V6 and Jos–helped develop a -sense of the spectrum of operating system designs. - -

-V6 presented pedagogic challenges from the start. -Students doubted the relevance of an obsolete 30-year-old operating system -written in an obsolete programming language (pre-K&R C) -running on obsolete hardware (the PDP-11). -Students also struggled to learn the low-level details of two different -architectures (the PDP-11 and the Intel x86) at the same time. -By the summer of 2006, we had decided to replace V6 -with a new operating system, xv6, modeled on V6 -but written in ANSI C and running on multiprocessor -Intel x86 machines. -Xv6's use of the x86 makes it more relevant to -students' experience than V6 was -and unifies the course around a single architecture. -Adding multiprocessor support requires handling concurrency head on with -locks and threads (instead of using special-case solutions for -uniprocessors such as -enabling/disabling interrupts) and helps relevance. -Finally, writing a new system allowed us to write cleaner versions -of the rougher parts of V6, like the scheduler and file system. -6.828 substituted xv6 for V6 in the fall of 2006. - -

Xv6 sources and text

- -The latest xv6 source is available via -
git clone git://pdos.csail.mit.edu/xv6/xv6.git
-We also distribute the sources as a printed booklet with line numbers -that keep everyone together during lectures. The booklet is available as xv6-rev6.pdf. To get the version -corresponding to this booklet, run -
git checkout -b xv6-rev6 xv6-rev6
- -

-The xv6 source code is licensed under -the traditional MIT -license; see the LICENSE file in the source distribution. To help students -read through xv6 and learn about the main ideas in operating systems we also -distribute a textbook/commentary for the latest xv6. -The line numbers in this book refer to the above source booklet. - -

-xv6 compiles using the GNU C compiler, -targeted at the x86 using ELF binaries. -On BSD and Linux systems, you can use the native compilers; -On OS X, which doesn't use ELF binaries, -you must use a cross-compiler. -Xv6 does boot on real hardware, but typically -we run it using the QEMU emulator. -Both the GCC cross compiler and QEMU -can be found on the 6.828 tools page. - -

Xv6 lecture material

- -In 6.828, the lectures in the first half of the course cover the xv6 sources and -text. The lectures in the second half consider advanced topics using research -papers; for some, xv6 serves as a useful base for making discussions concrete. -The lecture notes are available from the 6.828 schedule page. - - -

Unix Version 6

- -

6.828's xv6 is inspired by Unix V6 and by: - -

- -The following are useful to read the original code: - - -

Feedback

-If you are interested in using xv6 or have used xv6 in a course, -we would love to hear from you. -If there's anything that we can do to make xv6 easier -to adopt, we'd like to hear about it. -We'd also be interested to hear what worked well and what didn't. -

-Russ Cox (rsc@swtch.com)
-Frans Kaashoek (kaashoek@mit.edu)
-Robert Morris (rtm@mit.edu) -

-You can reach all of us at 6.828-staff@pdos.csail.mit.edu. - diff --git a/web/l-bugs.html b/web/l-bugs.html deleted file mode 100644 index 493372d..0000000 --- a/web/l-bugs.html +++ /dev/null @@ -1,187 +0,0 @@ -OS Bugs - - - - - -

OS Bugs

- -

Required reading: Bugs as deviant behavior - -

Overview

- -

Operating systems must obey many rules for correctness and -performance. Examples rules: -

- -

In addition, there are standard software engineering rules, like -use function results in consistent ways. - -

These rules are typically not checked by a compiler, even though -they could be checked by a compiler, in principle. The goal of the -meta-level compilation project is to allow system implementors to -write system-specific compiler extensions that check the source code -for rule violations. - -

The results are good: many new bugs found (500-1000) in Linux -alone. The paper for today studies these bugs and attempts to draw -lessons from these bugs. - -

Are kernel error worse than user-level errors? That is, if we get -the kernel correct, then we won't have system crashes? - -

Errors in JOS kernel

- -

What are unstated invariants in the JOS? -

- -

Could these errors have been caught by metacompilation? Would -metacompilation have caught the pipe race condition? (Probably not, -it happens in only one place.) - -

How confident are you that your code is correct? For example, -are you sure interrupts are always disabled in kernel mode? How would -you test? - -

Metacompilation

- -

A system programmer writes the rule checkers in a high-level, -state-machine language (metal). These checkers are dynamically linked -into an extensible version of g++, xg++. Xg++ applies the rule -checkers to every possible execution path of a function that is being -compiled. - -

An example rule from -the OSDI -paper: -

-sm check_interrupts {
-   decl { unsigned} flags;
-   pat enable = { sti(); } | {restore_flags(flags);} ;
-   pat disable = { cli(); };
-   
-   is_enabled: disable ==> is_disabled | enable ==> { err("double
-      enable")};
-   ...
-
-A more complete version found 82 errors in the Linux 2.3.99 kernel. - -

Common mistake: -

-get_free_buffer ( ... ) {
-   ....
-   save_flags (flags);
-   cli ();
-   if ((bh = sh->buffer_pool) == NULL)
-      return NULL;
-   ....
-}
-
-

(Figure 2 also lists a simple metarule.) - -

Some checkers produce false positives, because of limitations of -both static analysis and the checkers, which mostly use local -analysis. - -

How does the block checker work? The first pass is a rule -that marks functions as potentially blocking. After processing a -function, the checker emits the function's flow graph to a file -(including, annotations and functions called). The second pass takes -the merged flow graph of all function calls, and produces a file with -all functions that have a path in the control-flow-graph to a blocking -function call. For the Linux kernel this results in 3,000 functions -that potentially could call sleep. Yet another checker like -check_interrupts checks if a function calls any of the 3,000 functions -with interrupts disabled. Etc. - -

This paper

- -

Writing rules is painful. First, you have to write them. Second, -how do you decide what to check? Was it easy to enumerate all -conventions for JOS? - -

Insight: infer programmer "beliefs" from code and cross-check -for contradictions. If cli is always followed by sti, -except in one case, perhaps something is wrong. This simplifies -life because we can write generic checkers instead of checkers -that specifically check for sti, and perhaps we get lucky -and find other temporal ordering conventions. - -

Do we know which case is wrong? The 999 times or the 1 time that -sti is absent? (No, this method cannot figure what the correct -sequence is but it can flag that something is weird, which in practice -useful.) The method just detects inconsistencies. - -

Is every inconsistency an error? No, some inconsistency don't -indicate an error. If a call to function f is often followed -by call to function g, does that imply that f should always be -followed by g? (No!) - -

Solution: MUST beliefs and MAYBE beliefs. MUST beliefs are -invariants that must hold; any inconsistency indicates an error. If a -pointer is dereferences, then the programmer MUST believe that the -pointer is pointing to something that can be dereferenced (i.e., the -pointer is definitely not zero). MUST beliefs can be checked using -"internal inconsistencies". - -

An aside, can zero pointers pointers be detected during runtime? -(Sure, unmap the page at address zero.) Why is metacompilation still -valuable? (At runtime you will find only the null pointers that your -test code dereferenced; not all possible dereferences of null -pointers.) An even more convincing example for Metacompilation is -tracking user pointers that the kernel dereferences. (Is this a MUST -belief?) - -

MAYBE beliefs are invariants that are suggested by the code, but -they maybe coincidences. MAYBE beliefs are ranked by statistical -analysis, and perhaps augmented with input about functions names -(e.g., alloc and free are important). Is it computationally feasible -to check every MAYBE belief? Could there be much noise? - -

What errors won't this approach catch? - -

Paper discussion

- -

This paper is best discussed by studying every code fragment. Most -code fragments are pieces of code from Linux distributions; these -mistakes are real! - -

Section 3.1. what is the error? how does metacompilation catch -it? - -

Figure 1. what is the error? is there one? - -

Code fragments from 6.1. what is the error? how does metacompilation catch -it? - -

Figure 3. what is the error? how does metacompilation catch -it? - -

Section 8.3. what is the error? how does metacompilation catch -it? - - - diff --git a/web/l-coordination.html b/web/l-coordination.html deleted file mode 100644 index 79b578b..0000000 --- a/web/l-coordination.html +++ /dev/null @@ -1,354 +0,0 @@ -L9 - - - - - -

Coordination and more processes

- -

Required reading: remainder of proc.c, sys_exec, sys_sbrk, - sys_wait, sys_exit, and sys_kill. - -

Overview

- -

Big picture: more programs than processors. How to share the - limited number of processors among the programs? Last lecture - covered basic mechanism: threads and the distinction between process - and thread. Today expand: how to coordinate the interactions - between threads explicitly, and some operations on processes. - -

Sequence coordination. This is a diferrent type of coordination - than mutual-exclusion coordination (which has its goal to make - atomic actions so that threads don't interfere). The goal of - sequence coordination is for threads to coordinate the sequences in - which they run. - -

For example, a thread may want to wait until another thread - terminates. One way to do so is to have the thread run periodically, - let it check if the other thread terminated, and if not give up the - processor again. This is wasteful, especially if there are many - threads. - -

With primitives for sequence coordination one can do better. The - thread could tell the thread manager that it is waiting for an event - (e.g., another thread terminating). When the other thread - terminates, it explicitly wakes up the waiting thread. This is more - work for the programmer, but more efficient. - -

Sequence coordination often interacts with mutual-exclusion - coordination, as we will see below. - -

The operating system literature has a rich set of primivites for - sequence coordination. We study a very simple version of condition - variables in xv6: sleep and wakeup, with a single lock. - -

xv6 code examples

- -

Sleep and wakeup - usage

- -Let's consider implementing a producer/consumer queue -(like a pipe) that can be used to hold a single non-null pointer: - -
-struct pcq {
-    void *ptr;
-};
-
-void*
-pcqread(struct pcq *q)
-{
-    void *p;
-
-    while((p = q->ptr) == 0)
-        ;
-    q->ptr = 0;
-    return p;
-}
-
-void
-pcqwrite(struct pcq *q, void *p)
-{
-    while(q->ptr != 0)
-        ;
-    q->ptr = p;
-}
-
- -

Easy and correct, at least assuming there is at most one -reader and at most one writer at a time. - -

Unfortunately, the while loops are inefficient. -Instead of polling, it would be great if there were -primitives saying ``wait for some event to happen'' -and ``this event happened''. -That's what sleep and wakeup do. - -

Second try: - -

-void*
-pcqread(struct pcq *q)
-{
-    void *p;
-
-    if(q->ptr == 0)
-        sleep(q);
-    p = q->ptr;
-    q->ptr = 0;
-    wakeup(q);  /* wake pcqwrite */
-    return p;
-}
-
-void
-pcqwrite(struct pcq *q, void *p)
-{
-    if(q->ptr != 0)
-        sleep(q);
-    q->ptr = p;
-    wakeup(q);  /* wake pcqread */
-    return p;
-}
-
- -That's better, but there is still a problem. -What if the wakeup happens between the check in the if -and the call to sleep? - -

Add locks: - -

-struct pcq {
-    void *ptr;
-    struct spinlock lock;
-};
-
-void*
-pcqread(struct pcq *q)
-{
-    void *p;
-
-    acquire(&q->lock);
-    if(q->ptr == 0)
-        sleep(q, &q->lock);
-    p = q->ptr;
-    q->ptr = 0;
-    wakeup(q);  /* wake pcqwrite */
-    release(&q->lock);
-    return p;
-}
-
-void
-pcqwrite(struct pcq *q, void *p)
-{
-    acquire(&q->lock);
-    if(q->ptr != 0)
-        sleep(q, &q->lock);
-    q->ptr = p;
-    wakeup(q);  /* wake pcqread */
-    release(&q->lock);
-    return p;
-}
-
- -This is okay, and now safer for multiple readers and writers, -except that wakeup wakes up everyone who is asleep on chan, -not just one guy. -So some of the guys who wake up from sleep might not -be cleared to read or write from the queue. Have to go back to looping: - -
-struct pcq {
-    void *ptr;
-    struct spinlock lock;
-};
-
-void*
-pcqread(struct pcq *q)
-{
-    void *p;
-
-    acquire(&q->lock);
-    while(q->ptr == 0)
-        sleep(q, &q->lock);
-    p = q->ptr;
-    q->ptr = 0;
-    wakeup(q);  /* wake pcqwrite */
-    release(&q->lock);
-    return p;
-}
-
-void
-pcqwrite(struct pcq *q, void *p)
-{
-    acquire(&q->lock);
-    while(q->ptr != 0)
-        sleep(q, &q->lock);
-    q->ptr = p;
-    wakeup(q);  /* wake pcqread */
-    release(&q->lock);
-    return p;
-}
-
- -The difference between this an our original is that -the body of the while loop is a much more efficient way to pause. - -

Now we've figured out how to use it, but we -still need to figure out how to implement it. - -

Sleep and wakeup - implementation

-

-Simple implementation: - -

-void
-sleep(void *chan, struct spinlock *lk)
-{
-    struct proc *p = curproc[cpu()];
-    
-    release(lk);
-    p->chan = chan;
-    p->state = SLEEPING;
-    sched();
-}
-
-void
-wakeup(void *chan)
-{
-    for(each proc p) {
-        if(p->state == SLEEPING && p->chan == chan)
-            p->state = RUNNABLE;
-    }	
-}
-
- -

What's wrong? What if the wakeup runs right after -the release(lk) in sleep? -It still misses the sleep. - -

Move the lock down: -

-void
-sleep(void *chan, struct spinlock *lk)
-{
-    struct proc *p = curproc[cpu()];
-    
-    p->chan = chan;
-    p->state = SLEEPING;
-    release(lk);
-    sched();
-}
-
-void
-wakeup(void *chan)
-{
-    for(each proc p) {
-        if(p->state == SLEEPING && p->chan == chan)
-            p->state = RUNNABLE;
-    }	
-}
-
- -

This almost works. Recall from last lecture that we also need -to acquire the proc_table_lock before calling sched, to -protect p->jmpbuf. - -

-void
-sleep(void *chan, struct spinlock *lk)
-{
-    struct proc *p = curproc[cpu()];
-    
-    p->chan = chan;
-    p->state = SLEEPING;
-    acquire(&proc_table_lock);
-    release(lk);
-    sched();
-}
-
- -

The problem is that now we're using lk to protect -access to the p->chan and p->state variables -but other routines besides sleep and wakeup -(in particular, proc_kill) will need to use them and won't -know which lock protects them. -So instead of protecting them with lk, let's use proc_table_lock: - -

-void
-sleep(void *chan, struct spinlock *lk)
-{
-    struct proc *p = curproc[cpu()];
-    
-    acquire(&proc_table_lock);
-    release(lk);
-    p->chan = chan;
-    p->state = SLEEPING;
-    sched();
-}
-void
-wakeup(void *chan)
-{
-    acquire(&proc_table_lock);
-    for(each proc p) {
-        if(p->state == SLEEPING && p->chan == chan)
-            p->state = RUNNABLE;
-    }
-    release(&proc_table_lock);
-}
-
- -

One could probably make things work with lk as above, -but the relationship between data and locks would be -more complicated with no real benefit. Xv6 takes the easy way out -and says that elements in the proc structure are always protected -by proc_table_lock. - -

Use example: exit and wait

- -

If proc_wait decides there are children to be waited for, -it calls sleep at line 2462. -When a process exits, we proc_exit scans the process table -to find the parent and wakes it at 2408. - -

Which lock protects sleep and wakeup from missing each other? -Proc_table_lock. Have to tweak sleep again to avoid double-acquire: - -

-if(lk != &proc_table_lock) {
-    acquire(&proc_table_lock);
-    release(lk);
-}
-
- -

New feature: kill

- -

Proc_kill marks a process as killed (line 2371). -When the process finally exits the kernel to user space, -or if a clock interrupt happens while it is in user space, -it will be destroyed (line 2886, 2890, 2912). - -

Why wait until the process ends up in user space? - -

What if the process is stuck in sleep? It might take a long -time to get back to user space. -Don't want to have to wait for it, so make sleep wake up early -(line 2373). - -

This means all callers of sleep should check -whether they have been killed, but none do. -Bug in xv6. - -

System call handlers

- -

Sheet 32 - -

Fork: discussed copyproc in earlier lectures. -Sys_fork (line 3218) just calls copyproc -and marks the new proc runnable. -Does fork create a new process or a new thread? -Is there any shared context? - -

Exec: we'll talk about exec later, when we talk about file systems. - -

Sbrk: Saw growproc earlier. Why setupsegs before returning? diff --git a/web/l-fs.html b/web/l-fs.html deleted file mode 100644 index ed911fc..0000000 --- a/web/l-fs.html +++ /dev/null @@ -1,222 +0,0 @@ -L10 - - - - - -

File systems

- -

Required reading: iread, iwrite, and wdir, and code related to - these calls in fs.c, bio.c, ide.c, file.c, and sysfile.c - -

Overview

- -

The next 3 lectures are about file systems: -

- -

Users desire to store their data durable so that data survives when -the user turns of his computer. The primary media for doing so are: -magnetic disks, flash memory, and tapes. We focus on magnetic disks -(e.g., through the IDE interface in xv6). - -

To allow users to remember where they stored a file, they can -assign a symbolic name to a file, which appears in a directory. - -

The data in a file can be organized in a structured way or not. -The structured variant is often called a database. UNIX uses the -unstructured variant: files are streams of bytes. Any particular -structure is likely to be useful to only a small class of -applications, and other applications will have to work hard to fit -their data into one of the pre-defined structures. Besides, if you -want structure, you can easily write a user-mode library program that -imposes that format on any file. The end-to-end argument in action. -(Databases have special requirements and support an important class of -applications, and thus have a specialized plan.) - -

The API for a minimal file system consists of: open, read, write, -seek, close, and stat. Dup duplicates a file descriptor. For example: -

-  fd = open("x", O_RDWR);
-  read (fd, buf, 100);
-  write (fd, buf, 512);
-  close (fd)
-
- -

Maintaining the file offset behind the read/write interface is an - interesting design decision . The alternative is that the state of a - read operation should be maintained by the process doing the reading - (i.e., that the pointer should be passed as an argument to read). - This argument is compelling in view of the UNIX fork() semantics, - which clones a process which shares the file descriptors of its - parent. A read by the parent of a shared file descriptor (e.g., - stdin, changes the read pointer seen by the child). On the other - hand the alternative would make it difficult to get "(data; ls) > x" - right. - -

Unix API doesn't specify that the effects of write are immediately - on the disk before a write returns. It is up to the implementation - of the file system within certain bounds. Choices include (that - aren't non-exclusive): -

- -

A design issue is the semantics of a file system operation that - requires multiple disk writes. In particular, what happens if the - logical update requires writing multiple disks blocks and the power - fails during the update? For example, to create a new file, - requires allocating an inode (which requires updating the list of - free inodes on disk), writing a directory entry to record the - allocated i-node under the name of the new file (which may require - allocating a new block and updating the directory inode). If the - power fails during the operation, the list of free inodes and blocks - may be inconsistent with the blocks and inodes in use. Again this is - up to implementation of the file system to keep on disk data - structures consistent: -

- -

Another design issue is the semantics are of concurrent writes to -the same data item. What is the order of two updates that happen at -the same time? For example, two processes open the same file and write -to it. Modern Unix operating systems allow the application to lock a -file to get exclusive access. If file locking is not used and if the -file descriptor is shared, then the bytes of the two writes will get -into the file in some order (this happens often for log files). If -the file descriptor is not shared, the end result is not defined. For -example, one write may overwrite the other one (e.g., if they are -writing to the same part of the file.) - -

An implementation issue is performance, because writing to magnetic -disk is relatively expensive compared to computing. Three primary ways -to improve performance are: careful file system layout that induces -few seeks, an in-memory cache of frequently-accessed blocks, and -overlap I/O with computation so that file operations don't have to -wait until their completion and so that that the disk driver has more -data to write, which allows disk scheduling. (We will talk about -performance in detail later.) - -

xv6 code examples

- -

xv6 implements a minimal Unix file system interface. xv6 doesn't -pay attention to file system layout. It overlaps computation and I/O, -but doesn't do any disk scheduling. Its cache is write-through, which -simplifies keep on disk datastructures consistent, but is bad for -performance. - -

On disk files are represented by an inode (struct dinode in fs.h), -and blocks. Small files have up to 12 block addresses in their inode; -large files use files the last address in the inode as a disk address -for a block with 128 disk addresses (512/4). The size of a file is -thus limited to 12 * 512 + 128*512 bytes. What would you change to -support larger files? (Ans: e.g., double indirect blocks.) - -

Directories are files with a bit of structure to them. The file -contains of records of the type struct dirent. The entry contains the -name for a file (or directory) and its corresponding inode number. -How many files can appear in a directory? - -

In memory files are represented by struct inode in fsvar.h. What is -the role of the additional fields in struct inode? - -

What is xv6's disk layout? How does xv6 keep track of free blocks - and inodes? See balloc()/bfree() and ialloc()/ifree(). Is this - layout a good one for performance? What are other options? - -

Let's assume that an application created an empty file x with - contains 512 bytes, and that the application now calls read(fd, buf, - 100), that is, it is requesting to read 100 bytes into buf. - Furthermore, let's assume that the inode for x is is i. Let's pick - up what happens by investigating readi(), line 4483. -

- -

Now let's suppose that the process is writing 512 bytes at the end - of the file a. How many disk writes will happen? -

- -

Lots of code to implement reading and writing of files. How about - directories? -

-

Reading and writing of directories is trivial. - - diff --git a/web/l-interrupt.html b/web/l-interrupt.html deleted file mode 100644 index 363af5e..0000000 --- a/web/l-interrupt.html +++ /dev/null @@ -1,174 +0,0 @@ - -Lecture 6: Interrupts & Exceptions - - -

Interrupts & Exceptions

- -

-Required reading: xv6 trapasm.S, trap.c, syscall.c, usys.S. -
-You will need to consult -IA32 System -Programming Guide chapter 5 (skip 5.7.1, 5.8.2, 5.12.2). - -

Overview

- -

-Big picture: kernel is trusted third-party that runs the machine. -Only the kernel can execute privileged instructions (e.g., -changing MMU state). -The processor enforces this protection through the ring bits -in the code segment. -If a user application needs to carry out a privileged operation -or other kernel-only service, -it must ask the kernel nicely. -How can a user program change to the kernel address space? -How can the kernel transfer to a user address space? -What happens when a device attached to the computer -needs attention? -These are the topics for today's lecture. - -

-There are three kinds of events that must be handled -by the kernel, not user programs: -(1) a system call invoked by a user program, -(2) an illegal instruction or other kind of bad processor state (memory fault, etc.). -and -(3) an interrupt from a hardware device. - -

-Although these three events are different, they all use the same -mechanism to transfer control to the kernel. -This mechanism consists of three steps that execute as one atomic unit. -(a) change the processor to kernel mode; -(b) save the old processor somewhere (usually the kernel stack); -and (c) change the processor state to the values set up as -the “official kernel entry values.” -The exact implementation of this mechanism differs -from processor to processor, but the idea is the same. - -

-We'll work through examples of these today in lecture. -You'll see all three in great detail in the labs as well. - -

-A note on terminology: sometimes we'll -use interrupt (or trap) to mean both interrupts and exceptions. - -

-Setting up traps on the x86 -

- -

-See handout Table 5-1, Figure 5-1, Figure 5-2. - -

-xv6 Sheet 07: struct gatedesc and SETGATE. - -

-xv6 Sheet 28: tvinit and idtinit. -Note setting of gate for T_SYSCALL - -

-xv6 Sheet 29: vectors.pl (also see generated vectors.S). - -

-System calls -

- -

-xv6 Sheet 16: init.c calls open("console"). -How is that implemented? - -

-xv6 usys.S (not in book). -(No saving of registers. Why?) - -

-Breakpoint 0x1b:"open", -step past int instruction into kernel. - -

-See handout Figure 9-4 [sic]. - -

-xv6 Sheet 28: in vectors.S briefly, then in alltraps. -Step through to call trap, examine registers and stack. -How will the kernel find the argument to open? - -

-xv6 Sheet 29: trap, on to syscall. - -

-xv6 Sheet 31: syscall looks at eax, -calls sys_open. - -

-(Briefly) -xv6 Sheet 52: sys_open uses argstr and argint -to get its arguments. How do they work? - -

-xv6 Sheet 30: fetchint, fetcharg, argint, -argptr, argstr. - -

-What happens if a user program divides by zero -or accesses unmapped memory? -Exception. Same path as system call until trap. - -

-What happens if kernel divides by zero or accesses unmapped memory? - -

-Interrupts -

- -

-Like system calls, except: -devices generate them at any time, -there are no arguments in CPU registers, -nothing to return to, -usually can't ignore them. - -

-How do they get generated? -Device essentially phones up the -interrupt controller and asks to talk to the CPU. -Interrupt controller then buzzes the CPU and -tells it, “keyboard on line 1.” -Interrupt controller is essentially the CPU's -secretary administrative assistant, -managing the phone lines on the CPU's behalf. - -

-Have to set up interrupt controller. - -

-(Briefly) xv6 Sheet 63: pic_init sets up the interrupt controller, -irq_enable tells the interrupt controller to let the given -interrupt through. - -

-(Briefly) xv6 Sheet 68: pit8253_init sets up the clock chip, -telling it to interrupt on IRQ_TIMER 100 times/second. -console_init sets up the keyboard, enabling IRQ_KBD. - -

-In Bochs, set breakpoint at 0x8:"vector0" -and continue, loading kernel. -Step through clock interrupt, look at -stack, registers. - -

-Was the processor executing in kernel or user mode -at the time of the clock interrupt? -Why? (Have any user-space instructions executed at all?) - -

-Can the kernel get an interrupt at any time? -Why or why not? cli and sti, -irq_enable. - - - diff --git a/web/l-lock.html b/web/l-lock.html deleted file mode 100644 index eea8217..0000000 --- a/web/l-lock.html +++ /dev/null @@ -1,322 +0,0 @@ -L7 - - - - - -

Locking

- -

Required reading: spinlock.c - -

Why coordinate?

- -

Mutual-exclusion coordination is an important topic in operating -systems, because many operating systems run on -multiprocessors. Coordination techniques protect variables that are -shared among multiple threads and updated concurrently. These -techniques allow programmers to implement atomic sections so that one -thread can safely update the shared variables without having to worry -that another thread intervening. For example, processes in xv6 may -run concurrently on different processors and in kernel-mode share -kernel data structures. We must ensure that these updates happen -correctly. - -

List and insert example: -

-
-struct List {
-  int data;
-  struct List *next;
-};
-
-List *list = 0;
-
-insert(int data) {
-  List *l = new List;
-  l->data = data;
-  l->next = list;  // A
-  list = l;        // B
-}
-
- -

What needs to be atomic? The two statements labeled A and B should -always be executed together, as an indivisible fragment of code. If -two processors execute A and B interleaved, then we end up with an -incorrect list. To see that this is the case, draw out the list after -the sequence A1 (statement executed A by processor 1), A2 (statement A -executed by processor 2), B2, and B1. - -

How could this erroneous sequence happen? The varilable list -lives in physical memory shared among multiple processors, connected -by a bus. The accesses to the shared memory will be ordered in some -total order by the bus/memory system. If the programmer doesn't -coordinate the execution of the statements A and B, any order can -happen, including the erroneous one. - -

The erroneous case is called a race condition. The problem with -races is that they are difficult to reproduce. For example, if you -put print statements in to debug the incorrect behavior, you might -change the time and the race might not happen anymore. - -

Atomic instructions

- -

The programmer must be able express that A and B should be executed -as single atomic instruction. We generally use a concept like locks -to mark an atomic region, acquiring the lock at the beginning of the -section and releasing it at the end: - -

 
-void acquire(int *lock) {
-   while (TSL(lock) != 0) ; 
-}
-
-void release (int *lock) {
-  *lock = 0;
-}
-
- -

Acquire and release, of course, need to be atomic too, which can, -for example, be done with a hardware atomic TSL (try-set-lock) -instruction: - -

The semantics of TSL are: -

-   R <- [mem]   // load content of mem into register R
-   [mem] <- 1   // store 1 in mem.
-
- -

In a harware implementation, the bus arbiter guarantees that both -the load and store are executed without any other load/stores coming -in between. - -

We can use locks to implement an atomic insert, or we can use -TSL directly: -

-int insert_lock = 0;
-
-insert(int data) {
-
-  /* acquire the lock: */
-  while(TSL(&insert_lock) != 0)
-    ;
-
-  /* critical section: */
-  List *l = new List;
-  l->data = data;
-  l->next = list;
-  list = l;
-
-  /* release the lock: */
-  insert_lock = 0;
-}
-
- -

It is the programmer's job to make sure that locks are respected. If -a programmer writes another function that manipulates the list, the -programmer must must make sure that the new functions acquires and -releases the appropriate locks. If the programmer doesn't, race -conditions occur. - -

This code assumes that stores commit to memory in program order and -that all stores by other processors started before insert got the lock -are observable by this processor. That is, after the other processor -released a lock, all the previous stores are committed to memory. If -a processor executes instructions out of order, this assumption won't -hold and we must, for example, a barrier instruction that makes the -assumption true. - - -

Example: Locking on x86

- -

Here is one way we can implement acquire and release using the x86 -xchgl instruction: - -

-struct Lock {
-  unsigned int locked;
-};
-
-acquire(Lock *lck) {
-  while(TSL(&(lck->locked)) != 0)
-    ;
-}
-
-release(Lock *lck) {
-  lck->locked = 0;
-}
-
-int
-TSL(int *addr)
-{
-  register int content = 1;
-  // xchgl content, *addr
-  // xchgl exchanges the values of its two operands, while
-  // locking the memory bus to exclude other operations.
-  asm volatile ("xchgl %0,%1" :
-                "=r" (content),
-                "=m" (*addr) :
-                "0" (content),
-                "m" (*addr));
-  return(content);
-}
-
- -

the instruction "XCHG %eax, (content)" works as follows: -

    -
  1. freeze other CPUs' memory activity -
  2. temp := content -
  3. content := %eax -
  4. %eax := temp -
  5. un-freeze other CPUs -
- -

steps 1 and 5 make XCHG special: it is "locked" special signal - lines on the inter-CPU bus, bus arbitration - -

This implementation doesn't scale to a large number of processors; - in a later lecture we will see how we could do better. - -

Lock granularity

- -

Release/acquire is ideal for short atomic sections: increment a -counter, search in i-node cache, allocate a free buffer. - -

What are spin locks not so great for? Long atomic sections may - waste waiters' CPU time and it is to sleep while holding locks. In - xv6 we try to avoid long atomic sections by carefully coding (can - you find an example?). xv6 doesn't release the processor when - holding a lock, but has an additional set of coordination primitives - (sleep and wakeup), which we will study later. - -

My list_lock protects all lists; inserts to different lists are - blocked. A lock per list would waste less time spinning so you might - want "fine-grained" locks, one for every object BUT acquire/release - are expensive (500 cycles on my 3 ghz machine) because they need to - talk off-chip. - -

Also, "correctness" is not that simple with fine-grained locks if - need to maintain global invariants; e.g., "every buffer must be on - exactly one of free list and device list". Per-list locks are - irrelevant for this invariant. So you might want "large-grained", - which reduces overhead but reduces concurrency. - -

This tension is hard to get right. One often starts out with - "large-grained locks" and measures the performance of the system on - some workloads. When more concurrency is desired (to get better - performance), an implementor may switch to a more fine-grained - scheme. Operating system designers fiddle with this all the time. - -

Recursive locks and modularity

- -

When designing a system we desire clean abstractions and good - modularity. We like a caller not have to know about how a callee - implements a particul functions. Locks make achieving modularity - more complicated. For example, what to do when the caller holds a - lock, then calls a function, which also needs to the lock to perform - its job. - -

There are no transparent solutions that allow the caller and callee - to be unaware of which lokcs they use. One transparent, but - unsatisfactory option is recursive locks: If a callee asks for a - lock that its caller has, then we allow the callee to proceed. - Unfortunately, this solution is not ideal either. - -

Consider the following. If lock x protects the internals of some - struct foo, then if the caller acquires lock x, it know that the - internals of foo are in a sane state and it can fiddle with them. - And then the caller must restore them to a sane state before release - lock x, but until then anything goes. - -

This assumption doesn't hold with recursive locking. After - acquiring lock x, the acquirer knows that either it is the first to - get this lock, in which case the internals are in a sane state, or - maybe some caller holds the lock and has messed up the internals and - didn't realize when calling the callee that it was going to try to - look at them too. So the fact that a function acquired the lock x - doesn't guarantee anything at all. In short, locks protect against - callers and callees just as much as they protect against other - threads. - -

Since transparent solutions aren't ideal, it is better to consider - locks part of the function specification. The programmer must - arrange that a caller doesn't invoke another function while holding - a lock that the callee also needs. - -

Locking in xv6

- -

xv6 runs on a multiprocessor and is programmed to allow multiple -threads of computation to run concurrently. In xv6 an interrupt might -run on one processor and a process in kernel mode may run on another -processor, sharing a kernel data structure with the interrupt routing. -xv6 uses locks, implemented using an atomic instruction, to coordinate -concurrent activities. - -

Let's check out why xv6 needs locks by following what happens when -we start a second processor: -

- -

Why hold proc_table_lock during a context switch? It protects -p->state; the process has to hold some lock to avoid a race with -wakeup() and yield(), as we will see in the next lectures. - -

Why not a lock per proc entry? It might be expensive in in whole -table scans (in wait, wakeup, scheduler). proc_table_lock also -protects some larger invariants, for example it might be hard to get -proc_wait() right with just per entry locks. Right now the check to -see if there are any exited children and the sleep are atomic -- but -that would be hard with per entry locks. One could have both, but -that would probably be neither clean nor fast. - -

Of course, there is only processor searching the proc table if -acquire is implemented correctly. Let's check out acquire in -spinlock.c: -

- -

- -

Locking in JOS

- -

JOS is meant to run on single-CPU machines, and the plan can be -simple. The simple plan is disabling/enabling interrupts in the -kernel (IF flags in the EFLAGS register). Thus, in the kernel, -threads release the processors only when they want to and can ensure -that they don't release the processor during a critical section. - -

In user mode, JOS runs with interrupts enabled, but Unix user -applications don't share data structures. The data structures that -must be protected, however, are the ones shared in the library -operating system (e.g., pipes). In JOS we will use special-case -solutions, as you will find out in lab 6. For example, to implement -pipe we will assume there is one reader and one writer. The reader -and writer never update each other's variables; they only read each -other's variables. Carefully programming using this rule we can avoid -races. diff --git a/web/l-mkernel.html b/web/l-mkernel.html deleted file mode 100644 index 2984796..0000000 --- a/web/l-mkernel.html +++ /dev/null @@ -1,262 +0,0 @@ -Microkernel lecture - - - - - -

Microkernels

- -

Required reading: Improving IPC by kernel design - -

Overview

- -

This lecture looks at the microkernel organization. In a -microkernel, services that a monolithic kernel implements in the -kernel are running as user-level programs. For example, the file -system, UNIX process management, pager, and network protocols each run -in a separate user-level address space. The microkernel itself -supports only the services that are necessary to allow system services -to run well in user space; a typical microkernel has at least support -for creating address spaces, threads, and inter process communication. - -

The potential advantages of a microkernel are simplicity of the -kernel (small), isolation of operating system components (each runs in -its own user-level address space), and flexibility (we can have a file -server and a database server). One potential disadvantage is -performance loss, because what in a monolithich kernel requires a -single system call may require in a microkernel multiple system calls -and context switches. - -

One way in how microkernels differ from each other is the exact -kernel API they implement. For example, Mach (a system developed at -CMU, which influenced a number of commercial operating systems) has -the following system calls: processes (create, terminate, suspend, -resume, priority, assign, info, threads), threads (fork, exit, join, -detach, yield, self), ports and messages (a port is a unidirectionally -communication channel with a message queue and supporting primitives -to send, destroy, etc), and regions/memory objects (allocate, -deallocate, map, copy, inherit, read, write). - -

Some microkernels are more "microkernel" than others. For example, -some microkernels implement the pager in user space but the basic -virtual memory abstractions in the kernel (e.g, Mach); others, are -more extreme, and implement most of the virtual memory in user space -(L4). Yet others are less extreme: many servers run in their own -address space, but in kernel mode (Chorus). - -

All microkernels support multiple threads per address space. xv6 -and Unix until recently didn't; why? Because, in Unix system services -are typically implemented in the kernel, and those are the primary -programs that need multiple threads to handle events concurrently -(waiting for disk and processing new I/O requests). In microkernels, -these services are implemented in user-level address spaces and so -they need a mechanism to deal with handling operations concurrently. -(Of course, one can argue if fork efficient enough, there is no need -to have threads.) - -

L3/L4

- -

L3 is a predecessor to L4. L3 provides data persistence, DOS -emulation, and ELAN runtime system. L4 is a reimplementation of L3, -but without the data persistence. L4KA is a project at -sourceforge.net, and you can download the code for the latest -incarnation of L4 from there. - -

L4 is a "second-generation" microkernel, with 7 calls: IPC (of -which there are several types), id_nearest (find a thread with an ID -close the given ID), fpage_unmap (unmap pages, mapping is done as a -side-effect of IPC), thread_switch (hand processor to specified -thread), lthread_ex_regs (manipulate thread registers), -thread_schedule (set scheduling policies), task_new (create a new -address space with some default number of threads). These calls -provide address spaces, tasks, threads, interprocess communication, -and unique identifiers. An address space is a set of mappings. -Multiple threads may share mappings, a thread may grants mappings to -another thread (through IPC). Task is the set of threads sharing an -address space. - -

A thread is the execution abstraction; it belongs to an address -space, a UID, a register set, a page fault handler, and an exception -handler. A UID of a thread is its task number plus the number of the -thread within that task. - -

IPC passes data by value or by reference to another address space. -It also provide for sequence coordination. It is used for -communication between client and servers, to pass interrupts to a -user-level exception handler, to pass page faults to an external -pager. In L4, device drivers are implemented has a user-level -processes with the device mapped into their address space. -Linux runs as a user-level process. - -

L4 provides quite a scala of messages types: inline-by-value, -strings, and virtual memory mappings. The send and receive descriptor -specify how many, if any. - -

In addition, there is a system call for timeouts and controling -thread scheduling. - -

L3/L4 paper discussion

- - -Why must the parent directory be locked? If two processes try to -create the same name in the same directory, only one should succeed -and the other one, should receive an error (file exist). - -

Link, unlink, chdir, mount, umount could have taken file -descriptors instead of their path argument. In fact, this would get -rid of some possible race conditions (some of which have security -implications, TOCTTOU). However, this would require that the current -working directory be remembered by the process, and UNIX didn't have -good ways of maintaining static state shared among all processes -belonging to a given user. The easiest way is to create shared state -is to place it in the kernel. - -

We have one piece of code in xv6 that we haven't studied: exec. - With all the ground work we have done this code can be easily - understood (see sheet 54). - - diff --git a/web/l-okws.txt b/web/l-okws.txt deleted file mode 100644 index fa940d0..0000000 --- a/web/l-okws.txt +++ /dev/null @@ -1,249 +0,0 @@ - -Security -------------------- -I. 2 Intro Examples -II. Security Overview -III. Server Security: Offense + Defense -IV. Unix Security + POLP -V. Example: OKWS -VI. How to Build a Website - -I. Intro Examples --------------------- -1. Apache + OpenSSL 0.9.6a (CAN 2002-0656) - - SSL = More security! - - unsigned int j; - p=(unsigned char *)s->init_buf->data; - j= *(p++); - s->session->session_id_length=j; - memcpy(s->session->session_id,p,j); - - - the result: an Apache worm - -2. SparkNotes.com 2000: - - New profile feature that displays "public" information about users - but bug that made e-mail addresses "public" by default. - - New program for getting that data: - - http://www.sparknotes.com/getprofile.cgi?id=1343 - -II. Security Overview ----------------------- - -What Is Security? - - Protecting your system from attack. - - What's an attack? - - Stealing data - - Corrupting data - - Controlling resources - - DOS - - Why attack? - - Money - - Blackmail / extortion - - Vendetta - - intellectual curiosity - - fame - -Security is a Big topic - - - Server security -- today's focus. There's some machine sitting on the - Internet somewhere, with a certain interface exposed, and attackers - want to circumvent it. - - Why should you trust your software? - - - Client security - - Clients are usually servers, so they have many of the same issues. - - Slight simplification: people across the network cannot typically - initiate connections. - - Has a "fallible operator": - - Spyware - - Drive-by-Downloads - - - Client security turns out to be much harder -- GUI considerations, - look inside the browser and the applications. - - Systems community can more easily handle server security. - - We think mainly of servers. - -III. Server Security: Offense and Defense ------------------------------------------ - - Show picture of a Web site. - - Attacks | Defense ----------------------------------------------------------------------------- - 1. Break into DB from net | 1. FW it off - 2. Break into WS on telnet | 2. FW it off - 3. Buffer overrun in Apache | 3. Patch apache / use better lang? - 4. Buffer overrun in our code | 4. Use better lang / isolate it - 5. SQL injection | 5. Better escaping / don't interpret code. - 6. Data scraping. | 6. Use a sparse UID space. - 7. PW sniffing | 7. ??? - 8. Fetch /etc/passwd and crack | 8. Don't expose /etc/passwd - PW | - 9. Root escalation from apache | 9. No setuid programs available to Apache -10. XSS |10. Filter JS and input HTML code. -11. Keystroke recorded on sys- |11. Client security - admin's desktop (planetlab) | -12. DDOS |12. ??? - -Summary: - - That we want private data to be available to right people makes - this problem hard in the first place. Internet servers are there - for a reason. - - Security != "just encrypt your data;" this in fact can sometimes - make the problem worse. - - Best to prevent break-ins from happening in the first place. - - If they do happen, want to limit their damage (POLP). - - Security policies are difficult to express / package up neatly. - -IV. Design According to POLP (in Unix) ---------------------------------------- - - Assume any piece of a system can be compromised, by either bad - programming or malicious attack. - - Try to limit the damage done by such a compromise (along the lines - of the 4 attack goals). - - - -What's the goal on Unix? - - Keep processes from communicating that don't have to: - - limit FS, IPC, signals, ptrace - - Strip away unneeded privilege - - with respect to network, FS. - - Strip away FS access. - -How on Unix? - - setuid/setgid - - system call interposition - - chroot (away from setuid executables, /etc/passwd, /etc/ssh/..) - - - -How do you write chroot'ed programs? - - What about shared libraries? - - /etc/resolv.conf? - - Can chroot'ed programs access the FS at all? What if they need - to write to the FS or read from the FS? - - Fd's are *capabilities*; can pass them to chroot'ed services, - thereby opening new files on its behalf. - - Unforgeable - can only get them from the kernel via open/socket, etc. - -Unix Shortcomings (round 1) - - It's bad to run as root! - - Yet, need root for: - - chroot - - setuid/setgid to a lower-privileged user - - create a new user ID - - Still no guarantee that we've cut off all channels - - 200 syscalls! - - Default is to give most/all privileges. - - Can "break out" of chroot jails? - - Can still exploit race conditions in the kernel to escalate privileges. - -Sidebar - - setuid / setuid misunderstanding - - root / root misunderstanding - - effective vs. real vs. saved set-user-ID - -V. OKWS -------- -- Taking these principles as far as possible. -- C.f. Figure 1 From the paper.. -- Discussion of which privileges are in which processes - - - -- Technical details: how to launch a new service -- Within the launcher (running as root): - - - - // receive FDs from logger, pubd, demux - fork (); - chroot ("/var/okws/run"); - chdir ("/coredumps/51001"); - setgid (51001); - setuid (51001); - exec ("login", fds ... ); - -- Note no chroot -- why not? -- Once launched, how does a service get new connections? -- Note the goal - minimum tampering with each other in the - case of a compromise. - -Shortcoming of Unix (2) -- A lot of plumbing involved with this system. FDs flying everywhere. -- Isolation still not fine enough. If a service gets taken over, - can compromise all users of that service. - -VI. Reflections on Building Websites ---------------------------------- -- OKWS interesting "experiment" -- Need for speed; also, good gzip support. -- If you need compiled code, it's a good way to go. -- RPC-like system a must for backend communication -- Connection-pooling for free - -Biggest difficulties: -- Finding good C++ programmers. -- Compile times. -- The DB is still always the problem. - -Hard to Find good Alternatives -- Python / Perl - you might spend a lot of time writing C code / - integrating with lower level languages. -- Have to worry about DB pooling. -- Java -- must viable, and is getting better. Scary you can't peer - inside. -- .Net / C#-based system might be the way to go. - - -======================================================================= - -Extra Material: - -Capabilities (From the Eros Paper in SOSP 1999) - - - "Unforgeable pair made up of an object ID and a set of authorized - operations (an interface) on that object." - - c.f. Dennis and van Horn. "Programming semantics for multiprogrammed - computations," Communications of the ACM 9(3):143-154, Mar 1966. - - Thus: - - - Examples: - "Process X can write to file at inode Y" - "Process P can read from file at inode Z" - - Familiar example: Unix file descriptors - - - Why are they secure? - - Capabilities are "unforgeable" - - Processes can get them only through authorized interfaces - - Capabilities are only given to processes authorized to hold them - - - How do you get them? - - From the kernel (e.g., open) - - From other applications (e.g., FD passing) - - - How do you use them? - - read (fd), write(fd). - - - How do you revoke them once granted? - - In Unix, you do not. - - In some systems, a central authority ("reference monitor") can revoke. - - - How do you store them persistently? - - Can have circular dependencies (unlike an FS). - - What happens when the system starts up? - - Revert to checkpointed state. - - Often capability systems chose a single-level store. - - - Capability systems, a historical prospective: - - KeyKOS, Eros, Cyotos (UP research) - - Never saw any applications - - IBM Systems (System 38, later AS/400, later 'i Series') - - Commercially viable - - Problems: - - All bets are off when a capability is sent to the wrong place. - - Firewall analogy? diff --git a/web/l-plan9.html b/web/l-plan9.html deleted file mode 100644 index a3af3d5..0000000 --- a/web/l-plan9.html +++ /dev/null @@ -1,249 +0,0 @@ - - -Plan 9 - - - -

Plan 9

- -

Required reading: Plan 9 from Bell Labs

- -

Background

- -

Had moved away from the ``one computing system'' model of -Multics and Unix.

- -

Many computers (`workstations'), self-maintained, not a coherent whole.

- -

Pike and Thompson had been batting around ideas about a system glued together -by a single protocol as early as 1984. -Various small experiments involving individual pieces (file server, OS, computer) -tried throughout 1980s.

- -

Ordered the hardware for the ``real thing'' in beginning of 1989, -built up WORM file server, kernel, throughout that year.

- -

Some time in early fall 1989, Pike and Thompson were -trying to figure out a way to fit the window system in. -On way home from dinner, both independently realized that -needed to be able to mount a user-space file descriptor, -not just a network address.

- -

Around Thanksgiving 1989, spent a few days rethinking the whole -thing, added bind, new mount, flush, and spent a weekend -making everything work again. The protocol at that point was -essentially identical to the 9P in the paper.

- -

In May 1990, tried to use system as self-hosting. -File server kept breaking, had to keep rewriting window system. -Dozen or so users by then, mostly using terminal windows to -connect to Unix.

- -

Paper written and submitted to UKUUG in July 1990.

- -

Because it was an entirely new system, could take the -time to fix problems as they arose, in the right place.

- - -

Design Principles

- -

Three design principles:

- -

-1. Everything is a file.
-2. There is a standard protocol for accessing files.
-3. Private, malleable name spaces (bind, mount). -

- -

Everything is a file.

- -

Everything is a file (more everything than Unix: networks, graphics).

- -
-% ls -l /net
-% lp /dev/screen
-% cat /mnt/wsys/1/text
-
- -

Standard protocol for accessing files

- -

9P is the only protocol the kernel knows: other protocols -(NFS, disk file systems, etc.) are provided by user-level translators.

- -

Only one protocol, so easy to write filters and other -converters. Iostats puts itself between the kernel -and a command.

- -
-% iostats -xvdfdf /bin/ls
-
- -

Private, malleable name spaces

- -

Each process has its own private name space that it -can customize at will. -(Full disclosure: can arrange groups of -processes to run in a shared name space. Otherwise how do -you implement mount and bind?)

- -

Iostats remounts the root of the name space -with its own filter service.

- -

The window system mounts a file system that it serves -on /mnt/wsys.

- -

The network is actually a kernel device (no 9P involved) -but it still serves a file interface that other programs -use to access the network. -Easy to move out to user space (or replace) if necessary: -import network from another machine.

- -

Implications

- -

Everything is a file + can share files => can share everything.

- -

Per-process name spaces help move toward ``each process has its own -private machine.''

- -

One protocol: easy to build custom filters to add functionality -(e.g., reestablishing broken network connections). - -

File representation for networks, graphics, etc.

- -

Unix sockets are file descriptors, but you can't use the -usual file operations on them. Also far too much detail that -the user doesn't care about.

- -

In Plan 9: -

dial("tcp!plan9.bell-labs.com!http");
-
-(Protocol-independent!)

- -

Dial more or less does:
-write to /net/cs: tcp!plan9.bell-labs.com!http -read back: /net/tcp/clone 204.178.31.2!80 -write to /net/tcp/clone: connect 204.178.31.2!80 -read connection number: 4 -open /net/tcp/4/data -

- -

Details don't really matter. Two important points: -protocol-independent, and ordinary file operations -(open, read, write).

- -

Networks can be shared just like any other files.

- -

Similar story for graphics, other resources.

- -

Conventions

- -

Per-process name spaces mean that even full path names are ambiguous -(/bin/cat means different things on different machines, -or even for different users).

- -

Convention binds everything together. -On a 386, bind /386/bin /bin. - -

In Plan 9, always know where the resource should be -(e.g., /net, /dev, /proc, etc.), -but not which one is there.

- -

Can break conventions: on a 386, bind /alpha/bin /bin, just won't -have usable binaries in /bin anymore.

- -

Object-oriented in the sense of having objects (files) that all -present the same interface and can be substituted for one another -to arrange the system in different ways.

- -

Very little ``type-checking'': bind /net /proc; ps. -Great benefit (generality) but must be careful (no safety nets).

- - -

Other Contributions

- -

Portability

- -

Plan 9 still is the most portable operating system. -Not much machine-dependent code, no fancy features -tied to one machine's MMU, multiprocessor from the start (1989).

- -

Many other systems are still struggling with converting to SMPs.

- -

Has run on MIPS, Motorola 68000, Nextstation, Sparc, x86, PowerPC, Alpha, others.

- -

All the world is not an x86.

- -

Alef

- -

New programming language: convenient, but difficult to maintain. -Retired when author (Winterbottom) stopped working on Plan 9.

- -

Good ideas transferred to C library plus conventions.

- -

All the world is not C.

- -

UTF-8

- -

Thompson invented UTF-8. Pike and Thompson -converted Plan 9 to use it over the first weekend of September 1992, -in time for X/Open to choose it as the Unicode standard byte format -at a meeting the next week.

- -

UTF-8 is now the standard character encoding for Unicode on -all systems and interoperating between systems.

- -

Simple, easy to modify base for experiments

- -

Whole system source code is available, simple, easy to -understand and change. -There's a reason it only took a couple days to convert to UTF-8.

- -
-  49343  file server kernel
-
- 181611  main kernel
-  78521    ipaq port (small kernel)
-  20027      TCP/IP stack
-  15365      ipaq-specific code
-  43129      portable code
-
-1326778  total lines of source code
-
- -

Dump file system

- -

Snapshot idea might well have been ``in the air'' at the time. -(OldFiles in AFS appears to be independently derived, -use of WORM media was common research topic.)

- -

Generalized Fork

- -

Picked up by other systems: FreeBSD, Linux.

- -

Authentication

- -

No global super-user. -Newer, more Plan 9-like authentication described in later paper.

- -

New Compilers

- -

Much faster than gcc, simpler.

- -

8s to build acme for Linux using gcc; 1s to build acme for Plan 9 using 8c (but running on Linux)

- -

IL Protocol

- -

Now retired. -For better or worse, TCP has all the installed base. -IL didn't work very well on asymmetric or high-latency links -(e.g., cable modems).

- -

Idea propagation

- -

Many ideas have propagated out to varying degrees.

- -

Linux even has bind and user-level file servers now (FUSE), -but still not per-process name spaces.

- - - diff --git a/web/l-scalablecoord.html b/web/l-scalablecoord.html deleted file mode 100644 index da72c37..0000000 --- a/web/l-scalablecoord.html +++ /dev/null @@ -1,202 +0,0 @@ -Scalable coordination - - - - - -

Scalable coordination

- -

Required reading: Mellor-Crummey and Scott, Algorithms for Scalable - Synchronization on Shared-Memory Multiprocessors, TOCS, Feb 1991. - -

Overview

- -

Shared memory machines are bunch of CPUs, sharing physical memory. -Typically each processor also mantains a cache (for performance), -which introduces the problem of keep caches coherent. If processor 1 -writes a memory location whose value processor 2 has cached, then -processor 2's cache must be updated in some way. How? -

    - -
  • Bus-based schemes. Any CPU can access "dance with" any memory -equally ("dance hall arch"). Use "Snoopy" protocols: Each CPU's cache -listens to the memory bus. With write-through architecture, invalidate -copy when see a write. Or can have "ownership" scheme with write-back -cache (E.g., Pentium cache have MESI bits---modified, exclusive, -shared, invalid). If E bit set, CPU caches exclusively and can do -write back. But bus places limits on scalability. - -
  • More scalability w. NUMA schemes (non-uniform memory access). Each -CPU comes with fast "close" memory. Slower to access memory that is -stored with another processor. Use a directory to keep track of who is -caching what. For example, processor 0 is responsible for all memory -starting with address "000", processor 1 is responsible for all memory -starting with "001", etc. - -
  • COMA - cache-only memory architecture. Each CPU has local RAM, -treated as cache. Cache lines migrate around to different nodes based -on access pattern. Data only lives in cache, no permanent memory -location. (These machines aren't too popular any more.) - -
- - -

Scalable locks

- -

This paper is about cost and scalability of locking; what if you -have 10 CPUs waiting for the same lock? For example, what would -happen if xv6 runs on an SMP with many processors? - -

What's the cost of a simple spinning acquire/release? Algorithm 1 -*without* the delays, which is like xv6's implementation of acquire -and release (xv6 uses XCHG instead of test_and_set): -

-  each of the 10 CPUs gets the lock in turn
-  meanwhile, remaining CPUs in XCHG on lock
-  lock must be X in cache to run XCHG
-    otherwise all might read, then all might write
-  so bus is busy all the time with XCHGs!
-  can we avoid constant XCHGs while lock is held?
-
- -

test-and-test-and-set -

-  only run expensive TSL if not locked
-  spin on ordinary load instruction, so cache line is S
-  acquire(l)
-    while(1){
-      while(l->locked != 0) { }
-      if(TSL(&l->locked) == 0)
-        return;
-    }
-
- -

suppose 10 CPUs are waiting, let's count cost in total bus - transactions -

-  CPU1 gets lock in one cycle
-    sets lock's cache line to I in other CPUs
-  9 CPUs each use bus once in XCHG
-    then everyone has the line S, so they spin locally
-  CPU1 release the lock
-  CPU2 gets the lock in one cycle
-  8 CPUs each use bus once...
-  So 10 + 9 + 8 + ... = 50 transactions, O(n^2) in # of CPUs!
-  Look at "test-and-test-and-set" in Figure 6
-
-

Can we have n CPUs acquire a lock in O(n) time? - -

What is the point of the exponential backoff in Algorithm 1? -

-  Does it buy us O(n) time for n acquires?
-  Is there anything wrong with it?
-  may not be fair
-  exponential backoff may increase delay after release
-
- -

What's the point of the ticket locks, Algorithm 2? -

-  one interlocked instruction to get my ticket number
-  then I spin on now_serving with ordinary load
-  release() just increments now_serving
-
- -

why is that good? -

-  + fair
-  + no exponential backoff overshoot
-  + no spinning on 
-
- -

but what's the cost, in bus transactions? -

-  while lock is held, now_serving is S in all caches
-  release makes it I in all caches
-  then each waiters uses a bus transaction to get new value
-  so still O(n^2)
-
- -

What's the point of the array-based queuing locks, Algorithm 3? -

-    a lock has an array of "slots"
-    waiter allocates a slot, spins on that slot
-    release wakes up just next slot
-  so O(n) bus transactions to get through n waiters: good!
-  anderson lines in Figure 4 and 6 are flat-ish
-    they only go up because lock data structures protected by simpler lock
-  but O(n) space *per lock*!
-
- -

Algorithm 5 (MCS), the new algorithm of the paper, uses -compare_and_swap: -

-int compare_and_swap(addr, v1, v2) {
-  int ret = 0;
-  // stop all memory activity and ignore interrupts
-  if (*addr == v1) {
-    *addr = v2;
-    ret = 1;
-  }
-  // resume other memory activity and take interrupts
-  return ret;
-}
-
- -

What's the point of the MCS lock, Algorithm 5? -

-  constant space per lock, rather than O(n)
-  one "qnode" per thread, used for whatever lock it's waiting for
-  lock holder's qnode points to start of list
-  lock variable points to end of list
-  acquire adds your qnode to end of list
-    then you spin on your own qnode
-  release wakes up next qnode
-
- -

Wait-free or non-blocking data structures

- -

The previous implementations all block threads when there is - contention for a lock. Other atomic hardware operations allows one - to build implementation wait-free data structures. For example, one - can make an insert of an element in a shared list that don't block a - thread. Such versions are called wait free. - -

A linked list with locks is as follows: -

-Lock list_lock;
-
-insert(int x) {
-  element *n = new Element;
-  n->x = x;
-
-  acquire(&list_lock);
-  n->next = list;
-  list = n;
-  release(&list_lock);
-}
-
- -

A wait-free implementation is as follows: -

-insert (int x) {
-  element *n = new Element;
-  n->x = x;
-  do {
-     n->next = list;
-  } while (compare_and_swap (&list, n->next, n) == 0);
-}
-
-

How many bus transactions with 10 CPUs inserting one element in the -list? Could you do better? - -

This - paper by Fraser and Harris compares lock-based implementations - versus corresponding non-blocking implementations of a number of data - structures. - -

It is not possible to make every operation wait-free, and there are - times we will need an implementation of acquire and release. - research on non-blocking data structures is active; the last word - isn't said on this topic yet. - - diff --git a/web/l-schedule.html b/web/l-schedule.html deleted file mode 100644 index d87d7da..0000000 --- a/web/l-schedule.html +++ /dev/null @@ -1,340 +0,0 @@ -Scheduling - - - - - -

Scheduling

- -

Required reading: Eliminating receive livelock - -

Notes based on prof. Morris's lecture on scheduling (6.824, fall'02). - -

Overview

- -
    - -
  • What is scheduling? The OS policies and mechanisms to allocates -resources to entities. A good scheduling policy ensures that the most -important entitity gets the resources it needs. This topic was -popular in the days of time sharing, when there was a shortage of -resources. It seemed irrelevant in era of PCs and workstations, when -resources were plenty. Now the topic is back from the dead to handle -massive Internet servers with paying customers. The Internet exposes -web sites to international abuse and overload, which can lead to -resource shortages. Furthermore, some customers are more important -than others (e.g., the ones that buy a lot). - -
  • Key problems: -
      -
    • Gap between desired policy and available mechanism. The desired -policies often include elements that not implementable with the -mechanisms available to the operation system. Furthermore, often -there are many conflicting goals (low latency, high throughput, and -fairness), and the scheduler must make a trade-off between the goals. - -
    • Interaction between different schedulers. One have to take a -systems view. Just optimizing the CPU scheduler may do little to for -the overall desired policy. -
    - -
  • Resources you might want to schedule: CPU time, physical memory, -disk and network I/O, and I/O bus bandwidth. - -
  • Entities that you might want to give resources to: users, -processes, threads, web requests, or MIT accounts. - -
  • Many polices for resource to entity allocation are possible: -strict priority, divide equally, shortest job first, minimum guarantee -combined with admission control. - -
  • General plan for scheduling mechanisms -
      -
    1. Understand where scheduling is occuring. -
    2. Expose scheduling decisions, allow control. -
    3. Account for resource consumption, to allow intelligent control. -
    - -
  • Simple example from 6.828 kernel. The policy for scheduling -environments is to give each one equal CPU time. The mechanism used to -implement this policy is a clock interrupt every 10 msec and then -selecting the next environment in a round-robin fashion. - -

    But this only works if processes are compute-bound. What if a -process gives up some of its 10 ms to wait for input? Do we have to -keep track of that and give it back? - -

    How long should the quantum be? is 10 msec the right answer? -Shorter quantum will lead to better interactive performance, but -lowers overall system throughput because we will reschedule more, -which has overhead. - -

    What if the environment computes for 1 msec and sends an IPC to -the file server environment? Shouldn't the file server get more CPU -time because it operates on behalf of all other functions? - -

    Potential improvements for the 6.828 kernel: track "recent" CPU use -(e.g., over the last second) and always run environment with least -recent CPU use. (Still, if you sleep long enough you lose.) Other -solution: directed yield; specify on the yield to which environment -you are donating the remainder of the quantuam (e.g., to the file -server so that it can compute on the environment's behalf). - -

  • Pitfall: Priority Inversion -
    -  Assume policy is strict priority.
    -  Thread T1: low priority.
    -  Thread T2: medium priority.
    -  Thread T3: high priority.
    -  T1: acquire(l)
    -  context switch to T3
    -  T3: acquire(l)... must wait for T1 to release(l)...
    -  context switch to T2
    -  T2 computes for a while
    -  T3 is indefinitely delayed despite high priority.
    -  Can solve if T3 lends its priority to holder of lock it is waiting for.
    -    So T1 runs, not T2.
    -  [this is really a multiple scheduler problem.]
    -  [since locks schedule access to locked resource.]
    -
    - -
  • Pitfall: Efficiency. Efficiency often conflicts with fairness (or -any other policy). Long time quantum for efficiency in CPU scheduling -versus low delay. Shortest seek versus FIFO disk scheduling. -Contiguous read-ahead vs data needed now. For example, scheduler -swaps out my idle emacs to let gcc run faster with more phys mem. -What happens when I type a key? These don't fit well into a "who gets -to go next" scheduler framework. Inefficient scheduling may make -everybody slower, including high priority users. - -
  • Pitfall: Multiple Interacting Schedulers. Suppose you want your -emacs to have priority over everything else. Give it high CPU -priority. Does that mean nothing else will run if emacs wants to run? -Disk scheduler might not know to favor emacs's disk I/Os. Typical -UNIX disk scheduler favors disk efficiency, not process prio. Suppose -emacs needs more memory. Other processes have dirty pages; emacs must -wait. Does disk scheduler know these other processes' writes are high -prio? - -
  • Pitfall: Server Processes. Suppose emacs uses X windows to -display. The X server must serve requests from many clients. Does it -know that emacs' requests should be given priority? Does the OS know -to raise X's priority when it is serving emacs? Similarly for DNS, -and NFS. Does the network know to give emacs' NFS requests priority? - -
- -

In short, scheduling is a system problem. There are many -schedulers; they interact. The CPU scheduler is usually the easy -part. The hardest part is system structure. For example, the -existence of interrupts is bad for scheduling. Conflicting -goals may limit effectiveness. - -

Case study: modern UNIX

- -

Goals: -

    -
  • Simplicity (e.g. avoid complex locking regimes). -
  • Quick response to device interrupts. -
  • Favor interactive response. -
- -

UNIX has a number of execution environments. We care about -scheduling transitions among them. Some transitions aren't possible, -some can't be be controlled. The execution environments are: - -

    -
  • Process, user half -
  • Process, kernel half -
  • Soft interrupts: timer, network -
  • Device interrupts -
- -

The rules are: -

    -
  • User is pre-emptible. -
  • Kernel half and software interrupts are not pre-emptible. -
  • Device handlers may not make blocking calls (e.g., sleep) -
  • Effective priorities: intr > soft intr > kernel half > user -
- - - -

Rules are implemented as follows: - -

    - -
  • UNIX: Process User Half. Runs in process address space, on -per-process stack. Interruptible. Pre-emptible: interrupt may cause -context switch. We don't trust user processes to yield CPU. -Voluntarily enters kernel half via system calls and faults. - -
  • UNIX: Process Kernel Half. Runs in kernel address space, on -per-process kernel stack. Executes system calls and faults for its -process. Interruptible (but can defer interrupts in critical -sections). Not pre-emptible. Only yields voluntarily, when waiting -for an event. E.g. disk I/O done. This simplifies concurrency -control; locks often not required. No user process runs if any kernel -half wants to run. Many process' kernel halfs may be sleeping in the -kernel. - -
  • UNIX: Device Interrupts. Hardware asks CPU for an interrupt to ask -for attention. Disk read/write completed, or network packet received. -Runs in kernel space, on special interrupt stack. Interrupt routine -cannot block; must return. Interrupts are interruptible. They nest -on the one interrupt stack. Interrupts are not pre-emptible, and -cannot really yield. The real-time clock is a device and interrupts -every 10ms (or whatever). Process scheduling decisions can be made -when interrupt returns (e.g. wake up the process waiting for this -event). You want interrupt processing to be fast, since it has -priority. Don't do any more work than you have to. You're blocking -processes and other interrupts. Typically, an interrupt does the -minimal work necessary to keep the device happy, and then call wakeup -on a thread. - -
  • UNIX: Soft Interrupts. (Didn't exist in xv6) Used when device -handling is expensive. But no obvious process context in which to -run. Examples include IP forwarding, TCP input processing. Runs in -kernel space, on interrupt stack. Interruptable. Not pre-emptable, -can't really yield. Triggered by hardware interrupt. Called when -outermost hardware interrupt returns. Periodic scheduling decisions -are made in timer s/w interrupt. Scheduled by hardware timer -interrupt (i.e., if current process has run long enough, switch). -
- -

Is this good software structure? Let's talk about receive -livelock. - -

Paper discussion

- -
    - -
  • What is application that the paper is addressing: IP forwarding. -What functionality does a network interface offer to driver? -
      -
    • Read packets -
    • Poke hardware to send packets -
    • Interrupts when packet received/transmit complete -
    • Buffer many input packets -
    - -
  • What devices in the 6.828 kernel are interrupt driven? Which one -are polling? Is this ideal? - -
  • Explain Figure 6-1. Why does it go up? What determines how high -the peak is? Why does it go down? What determines how fast it goes -does? Answer: -
    -(fraction of packets discarded)(work invested in discarded packets)
    -           -------------------------------------------
    -              (total work CPU is capable of)
    -
    - -
  • Suppose I wanted to test an NFS server for livelock. -
    -  Run client with this loop:
    -    while(1){
    -      send NFS READ RPC;
    -      wait for response;
    -    }
    -
    -What would I see? Is the NFS server probably subject to livelock? -(No--offered load subject to feedback). - -
  • What other problems are we trying to address? -
      -
    • Increased latency for packet delivery and forwarding (e.g., start -disk head moving when first NFS read request comes) -
    • Transmit starvation -
    • User-level CPU starvation -
    - -
  • Why not tell the O/S scheduler to give interrupts lower priority? -Non-preemptible. -Could you fix this by making interrupts faster? (Maybe, if coupled -with some limit on input rate.) - -
  • Why not completely process each packet in the interrupt handler? -(I.e. forward it?) Other parts of kernel don't expect to run at high -interrupt-level (e.g., some packet processing code might invoke a function -that sleeps). Still might want an output queue - -
  • What about using polling instead of interrupts? Solves overload -problem, but killer for latency. - -
  • What's the paper's solution? -
      -
    • No IP input queue. -
    • Input processing and device input polling in kernel thread. -
    • Device receive interrupt just wakes up thread. And leaves -interrupts *disabled* for that device. -
    • Thread does all input processing, then re-enables interrupts. -
    -

    Why does this work? What happens when packets arrive too fast? -What happens when packets arrive slowly? - -

  • Explain Figure 6-3. -
      -
    • Why does "Polling (no quota)" work badly? (Input still starves -xmit complete processing.) -
    • Why does it immediately fall to zero, rather than gradually decreasing? -(xmit complete processing must be very cheap compared to input.) -
    - -
  • Explain Figure 6-4. -
      - -
    • Why does "Polling, no feedback" behave badly? There's a queue in -front of screend. We can still give 100% to input thread, 0% to -screend. - -
    • Why does "Polling w/ feedback" behave well? Input thread yields -when queue to screend fills. - -
    • What if screend hangs, what about other consumers of packets? -(e.g., can you ssh to machine to fix screend?) Fortunately screend -typically is only application. Also, re-enable input after timeout. - -
    - -
  • Why are the two solutions different? -
      -
    1. Polling thread with quotas. -
    2. Feedback from full queue. -
    -(I believe they should have used #2 for both.) - -
  • If we apply the proposed fixes, does the phenomemon totally go - away? (e.g. for web server, waits for disk, &c.) -
      -
    • Can the net device throw away packets without slowing down host? -
    • Problem: We want to drop packets for applications with big queues. -But requires work to determine which application a packet belongs to -Solution: NI-LRP (have network interface sort packets) -
    - -
  • What about latency question? (Look at figure 14 p. 243.) -
      -
    • 1st packet looks like an improvement over non-polling. But 2nd -packet transmitted later with poling. Why? (No new packets added to -xmit buffer until xmit interrupt) -
    • Why? In traditional BSD, to -amortize cost of poking device. Maybe better to poke a second time -anyway. -
    - -
  • What if processing has more complex structure? -
      -
    • Chain of processing stages with queues? Does feedback work? - What happens when a late stage is slow? -
    • Split at some point, multiple parallel paths? No so great; one - slow path blocks all paths. -
    - -
  • Can we formulate any general principles from paper? -
      -
    • Don't spend time on new work before completing existing work. -
    • Or give new work lower priority than partially-completed work. -
    - -
diff --git a/web/l-threads.html b/web/l-threads.html deleted file mode 100644 index 8587abb..0000000 --- a/web/l-threads.html +++ /dev/null @@ -1,316 +0,0 @@ -L8 - - - - - -

Threads, processes, and context switching

- -

Required reading: proc.c (focus on scheduler() and sched()), -setjmp.S, and sys_fork (in sysproc.c) - -

Overview

- - -

Big picture: more programs than processors. How to share the -limited number of processors among the programs? - -

Observation: most programs don't need the processor continuously, -because they frequently have to wait for input (from user, disk, -network, etc.) - -

Idea: when one program must wait, it releases the processor, and -gives it to another program. - -

Mechanism: thread of computation, an active active computation. A -thread is an abstraction that contains the minimal state that is -necessary to stop an active and an resume it at some point later. -What that state is depends on the processor. On x86, it is the -processor registers (see setjmp.S). - -

Address spaces and threads: address spaces and threads are in -principle independent concepts. One can switch from one thread to -another thread in the same address space, or one can switch from one -thread to another thread in another address space. Example: in xv6, -one switches address spaces by switching segmentation registers (see -setupsegs). Does xv6 ever switch from one thread to another in the -same address space? (Answer: yes, v6 switches, for example, from the -scheduler, proc[0], to the kernel part of init, proc[1].) In the JOS -kernel we switch from the kernel thread to a user thread, but we don't -switch kernel space necessarily. - -

Process: one address space plus one or more threads of computation. -In xv6 all user programs contain one thread of computation and -one address space, and the concepts of address space and threads of -computation are not separated but bundled together in the concept of a -process. When switching from the kernel program (which has multiple -threads) to a user program, xv6 switches threads (switching from a -kernel stack to a user stack) and address spaces (the hardware uses -the kernel segment registers and the user segment registers). - -

xv6 supports the following operations on processes: -

    -
  • fork; create a new process, which is a copy of the parent. -
  • exec; execute a program -
  • exit: terminte process -
  • wait: wait for a process to terminate -
  • kill: kill process -
  • sbrk: grow the address space of a process. -
-This interfaces doesn't separate threads and address spaces. For -example, with this interface one cannot create additional threads in -the same threads. Modern Unixes provides additional primitives -(called pthreads, POSIX threads) to create additional threads in a -process and coordinate their activities. - -

Scheduling. The thread manager needs a method for deciding which -thread to run if multiple threads are runnable. The xv6 policy is to -run the processes round robin. Why round robin? What other methods -can you imagine? - -

Preemptive scheduling. To force a thread to release the processor -periodically (in case the thread never calls sleep), a thread manager -can use preemptive scheduling. The thread manager uses the clock chip -to generate periodically a hardware interrupt, which will cause -control to transfer to the thread manager, which then can decide to -run another thread (e.g., see trap.c). - -

xv6 code examples

- -

Thread switching is implemented in xv6 using setjmp and longjmp, -which take a jumpbuf as an argument. setjmp saves its context in a -jumpbuf for later use by longjmp. longjmp restores the context saved -by the last setjmp. It then causes execution to continue as if the -call of setjmp has just returned 1. -

    -
  • setjmp saves: ebx, exc, edx, esi, edi, esp, ebp, and eip. -
  • longjmp restores them, and puts 1 in eax! -
- -

Example of thread switching: proc[0] switches to scheduler: -

    -
  • 1359: proc[0] calls iget, which calls sleep, which calls sched. -
  • 2261: The stack before the call to setjmp in sched is: -
    -CPU 0:
    -eax: 0x10a144   1089860
    -ecx: 0x6c65746e 1818588270
    -edx: 0x0        0
    -ebx: 0x10a0e0   1089760
    -esp: 0x210ea8   2166440
    -ebp: 0x210ebc   2166460
    -esi: 0x107f20   1081120
    -edi: 0x107740   1079104
    -eip: 0x1023c9  
    -eflags 0x12      
    -cs:  0x8       
    -ss:  0x10      
    -ds:  0x10      
    -es:  0x10      
    -fs:  0x10      
    -gs:  0x10      
    -   00210ea8 [00210ea8]  10111e
    -   00210eac [00210eac]  210ebc
    -   00210eb0 [00210eb0]  10239e
    -   00210eb4 [00210eb4]  0001
    -   00210eb8 [00210eb8]  10a0e0
    -   00210ebc [00210ebc]  210edc
    -   00210ec0 [00210ec0]  1024ce
    -   00210ec4 [00210ec4]  1010101
    -   00210ec8 [00210ec8]  1010101
    -   00210ecc [00210ecc]  1010101
    -   00210ed0 [00210ed0]  107740
    -   00210ed4 [00210ed4]  0001
    -   00210ed8 [00210ed8]  10cd74
    -   00210edc [00210edc]  210f1c
    -   00210ee0 [00210ee0]  100bbc
    -   00210ee4 [00210ee4]  107740
    -
    -
  • 2517: stack at beginning of setjmp: -
    -CPU 0:
    -eax: 0x10a144   1089860
    -ecx: 0x6c65746e 1818588270
    -edx: 0x0        0
    -ebx: 0x10a0e0   1089760
    -esp: 0x210ea0   2166432
    -ebp: 0x210ebc   2166460
    -esi: 0x107f20   1081120
    -edi: 0x107740   1079104
    -eip: 0x102848  
    -eflags 0x12      
    -cs:  0x8       
    -ss:  0x10      
    -ds:  0x10      
    -es:  0x10      
    -fs:  0x10      
    -gs:  0x10      
    -   00210ea0 [00210ea0]  1023cf   <--- return address (sched)
    -   00210ea4 [00210ea4]  10a144
    -   00210ea8 [00210ea8]  10111e
    -   00210eac [00210eac]  210ebc
    -   00210eb0 [00210eb0]  10239e
    -   00210eb4 [00210eb4]  0001
    -   00210eb8 [00210eb8]  10a0e0
    -   00210ebc [00210ebc]  210edc
    -   00210ec0 [00210ec0]  1024ce
    -   00210ec4 [00210ec4]  1010101
    -   00210ec8 [00210ec8]  1010101
    -   00210ecc [00210ecc]  1010101
    -   00210ed0 [00210ed0]  107740
    -   00210ed4 [00210ed4]  0001
    -   00210ed8 [00210ed8]  10cd74
    -   00210edc [00210edc]  210f1c
    -
    -
  • 2519: What is saved in jmpbuf of proc[0]? -
  • 2529: return 0! -
  • 2534: What is in jmpbuf of cpu 0? The stack is as follows: -
    -CPU 0:
    -eax: 0x0        0
    -ecx: 0x6c65746e 1818588270
    -edx: 0x108aa4   1084068
    -ebx: 0x10a0e0   1089760
    -esp: 0x210ea0   2166432
    -ebp: 0x210ebc   2166460
    -esi: 0x107f20   1081120
    -edi: 0x107740   1079104
    -eip: 0x10286e  
    -eflags 0x46      
    -cs:  0x8       
    -ss:  0x10      
    -ds:  0x10      
    -es:  0x10      
    -fs:  0x10      
    -gs:  0x10      
    -   00210ea0 [00210ea0]  1023fe
    -   00210ea4 [00210ea4]  108aa4
    -   00210ea8 [00210ea8]  10111e
    -   00210eac [00210eac]  210ebc
    -   00210eb0 [00210eb0]  10239e
    -   00210eb4 [00210eb4]  0001
    -   00210eb8 [00210eb8]  10a0e0
    -   00210ebc [00210ebc]  210edc
    -   00210ec0 [00210ec0]  1024ce
    -   00210ec4 [00210ec4]  1010101
    -   00210ec8 [00210ec8]  1010101
    -   00210ecc [00210ecc]  1010101
    -   00210ed0 [00210ed0]  107740
    -   00210ed4 [00210ed4]  0001
    -   00210ed8 [00210ed8]  10cd74
    -   00210edc [00210edc]  210f1c
    -
    -
  • 2547: return 1! stack looks as follows: -
    -CPU 0:
    -eax: 0x1        1
    -ecx: 0x108aa0   1084064
    -edx: 0x108aa4   1084068
    -ebx: 0x10074    65652
    -esp: 0x108d40   1084736
    -ebp: 0x108d5c   1084764
    -esi: 0x10074    65652
    -edi: 0xffde     65502
    -eip: 0x102892  
    -eflags 0x6       
    -cs:  0x8       
    -ss:  0x10      
    -ds:  0x10      
    -es:  0x10      
    -fs:  0x10      
    -gs:  0x10      
    -   00108d40 [00108d40]  10231c
    -   00108d44 [00108d44]  10a144
    -   00108d48 [00108d48]  0010
    -   00108d4c [00108d4c]  0021
    -   00108d50 [00108d50]  0000
    -   00108d54 [00108d54]  0000
    -   00108d58 [00108d58]  10a0e0
    -   00108d5c [00108d5c]  0000
    -   00108d60 [00108d60]  0001
    -   00108d64 [00108d64]  0000
    -   00108d68 [00108d68]  0000
    -   00108d6c [00108d6c]  0000
    -   00108d70 [00108d70]  0000
    -   00108d74 [00108d74]  0000
    -   00108d78 [00108d78]  0000
    -   00108d7c [00108d7c]  0000
    -
    -
  • 2548: where will longjmp return? (answer: 10231c, in scheduler) -
  • 2233:Scheduler on each processor selects in a round-robin fashion the - first runnable process. Which process will that be? (If we are - running with one processor.) (Ans: proc[0].) -
  • 2229: what will be saved in cpu's jmpbuf? -
  • What is in proc[0]'s jmpbuf? -
  • 2548: return 1. Stack looks as follows: -
    -CPU 0:
    -eax: 0x1        1
    -ecx: 0x6c65746e 1818588270
    -edx: 0x0        0
    -ebx: 0x10a0e0   1089760
    -esp: 0x210ea0   2166432
    -ebp: 0x210ebc   2166460
    -esi: 0x107f20   1081120
    -edi: 0x107740   1079104
    -eip: 0x102892  
    -eflags 0x2       
    -cs:  0x8       
    -ss:  0x10      
    -ds:  0x10      
    -es:  0x10      
    -fs:  0x10      
    -gs:  0x10      
    -   00210ea0 [00210ea0]  1023cf   <--- return to sleep
    -   00210ea4 [00210ea4]  108aa4
    -   00210ea8 [00210ea8]  10111e
    -   00210eac [00210eac]  210ebc
    -   00210eb0 [00210eb0]  10239e
    -   00210eb4 [00210eb4]  0001
    -   00210eb8 [00210eb8]  10a0e0
    -   00210ebc [00210ebc]  210edc
    -   00210ec0 [00210ec0]  1024ce
    -   00210ec4 [00210ec4]  1010101
    -   00210ec8 [00210ec8]  1010101
    -   00210ecc [00210ecc]  1010101
    -   00210ed0 [00210ed0]  107740
    -   00210ed4 [00210ed4]  0001
    -   00210ed8 [00210ed8]  10cd74
    -   00210edc [00210edc]  210f1c
    -
    -
- -

Why switch from proc[0] to the processor stack, and then to - proc[0]'s stack? Why not instead run the scheduler on the kernel - stack of the last process that run on that cpu? - -

    - -
  • If the scheduler wanted to use the process stack, then it couldn't - have any stack variables live across process scheduling, since - they'd be different depending on which process just stopped running. - -
  • Suppose process p goes to sleep on CPU1, so CPU1 is idling in - scheduler() on p's stack. Someone wakes up p. CPU2 decides to run - p. Now p is running on its stack, and CPU1 is also running on the - same stack. They will likely scribble on each others' local - variables, return pointers, etc. - -
  • The same thing happens if CPU1 tries to reuse the process's page -tables to avoid a TLB flush. If the process gets killed and cleaned -up by the other CPU, now the page tables are wrong. I think some OSes -actually do this (with appropriate ref counting). - -
- -

How is preemptive scheduling implemented in xv6? Answer see trap.c - line 2905 through 2917, and the implementation of yield() on sheet - 22. - -

How long is a timeslice for a user process? (possibly very short; - very important lock is held across context switch!) - - - - - diff --git a/web/l-vm.html b/web/l-vm.html deleted file mode 100644 index ffce13e..0000000 --- a/web/l-vm.html +++ /dev/null @@ -1,462 +0,0 @@ - - -Virtual Machines - - - - -

Virtual Machines

- -

Required reading: Disco

- -

Overview

- -

What is a virtual machine? IBM definition: a fully protected and -isolated copy of the underlying machine's hardware.

- -

Another view is that it provides another example of a kernel API. -In contrast to other kernel APIs (unix, microkernel, and exokernel), -the virtual machine operating system exports as the kernel API the -processor API (e.g., the x86 interface). Thus, each program running -in user space sees the services offered by a processor, and each -program sees its own processor. Of course, we don't want to make a -system call for each instruction, and in fact one of the main -challenges in virtual machine operation systems is to design the -system in such a way that the physical processor executes the virtual -processor API directly, at processor speed. - -

-Virtual machines can be useful for a number of reasons: -

    - -
  1. Run multiple operating systems on single piece of hardware. For -example, in one process, you run Linux, and in another you run -Windows/XP. If the kernel API is identical to the x86 (and faithly -emulates x86 instructions, state, protection levels, page tables), -then Linux and Windows/XP, the virual machine operationg system can -run these guest operating systems without modifications. - -
      -
    • Run "older" programs on the same hardware (e.g., run one x86 -virtual machine in real mode to execute old DOS apps). - -
    • Or run applications that require different operating system. -
    - -
  2. Fault isolation: like processes on UNIX but more complete, because -the guest operating systems runs on the virtual machine in user space. -Thus, faults in the guest OS cannot effect any other software. - -
  3. Customizing the apparent hardware: virtual machine may have -different view of hardware than is physically present. - -
  4. Simplify deployment/development of software for scalable -processors (e.g., Disco). - -
-

- -

If your operating system isn't a virtual machine operating system, -what are the alternatives? Processor simulation (e.g., bochs) or -binary emulation (WINE). Simulation runs instructions purely in -software and is slow (e.g., 100x slow down for bochs); virtualization -gets out of the way whenever possible and can be efficient. - -

Simulation gives portability whereas virtualization focuses on -performance. However, this means that you need to model your hardware -very carefully in software. Binary emulation focuses on just getting -system call for a particular operating system's interface. Binary -emulation can be hard because it is targetted towards a particular -operating system (and even that can change between revisions). -

- -

To provide each process with its own virtual processor that exports -the same API as the physical processor, what features must -the virtual machine operating system virtualize? -

    -
  1. CPU: instructions -- trap all privileged instructions
  2. -
  3. Memory: address spaces -- map "physical" pages managed -by the guest OS to machinepages, handle translation, etc.
  4. -
  5. Devices: any I/O communication needs to be trapped and passed - through/handled appropriately.
  6. -
-

-The software that implements the virtualization is typically called -the monitor, instead of the virtual machine operating system. - -

Virtual machine monitors (VMM) can be implemented in two ways: -

    -
  1. Run VMM directly on hardware: like Disco.
  2. -
  3. Run VMM as an application (though still running as root, with - integration into OS) on top of a host OS: like VMware. Provides - additional hardware support at low development cost in - VMM. Intercept CPU-level I/O requests and translate them into - system calls (e.g. read()).
  4. -
-

- -

The three primary functions of a virtual machine monitor are: -

    -
  • virtualize processor (CPU, memory, and devices) -
  • dispatch events (e.g., forward page fault trap to guest OS). -
  • allocate resources (e.g., divide real memory in some way between -the physical memory of each guest OS). -
- -

Virtualization in detail

- -

Memory virtualization

- -

-Understanding memory virtualization. Let's consider the MIPS example -from the paper. Ideally, we'd be able to intercept and rewrite all -memory address references. (e.g., by intercepting virtual memory -calls). Why can't we do this on the MIPS? (There are addresses that -don't go through address translation --- but we don't want the virtual -machine to directly access memory!) What does Disco do to get around -this problem? (Relink the kernel outside this address space.) -

- -

-Having gotten around that problem, how do we handle things in general? -

-
-// Disco's tlb miss handler.
-// Called when a memory reference for virtual adddress
-// 'VA' is made, but there is not VA->MA (virtual -> machine)
-// mapping in the cpu's TLB.
-void tlb_miss_handler (VA)
-{
-  // see if we have a mapping in our "shadow" tlb (which includes
-  // "main" tlb)
-  tlb_entry *t = tlb_lookup (thiscpu->l2tlb, va);
-  if (t && defined (thiscpu->pmap[t->pa]))   // is there a MA for this PA?
-    tlbwrite (va, thiscpu->pmap[t->pa], t->otherdata);
-  else if (t)
-    // get a machine page, copy physical page into, and tlbwrite
-  else
-    // trap to the virtual CPU/OS's handler
-}
-
-// Disco's procedure which emulates the MIPS
-// instruction which writes to the tlb.
-//
-// VA -- virtual addresss
-// PA -- physical address (NOT MA machine address!)
-// otherdata -- perms and stuff
-void emulate_tlbwrite_instruction (VA, PA, otherdata)
-{
-  tlb_insert (thiscpu->l2tlb, VA, PA, otherdata); // cache
-  if (!defined (thiscpu->pmap[PA])) { // fill in pmap dynamically
-    MA = allocate_machine_page ();
-    thiscpu->pmap[PA] = MA; // See 4.2.2
-    thiscpu->pmapbackmap[MA] = PA;
-    thiscpu->memmap[MA] = VA; // See 4.2.3 (for TLB shootdowns)
-  }
-  tlbwrite (va, thiscpu->pmap[PA], otherdata);
-}
-
-// Disco's procedure which emulates the MIPS
-// instruction which read the tlb.
-tlb_entry *emulate_tlbread_instruction (VA)
-{
-  // Must return a TLB entry that has a "Physical" address;
-  // This is recorded in our secondary TLB cache.
-  // (We don't have to read from the hardware TLB since
-  // all writes to the hardware TLB are mediated by Disco.
-  // Thus we can always keep the l2tlb up to date.)
-  return tlb_lookup (thiscpu->l2tlb, va);
-}
-
- -

CPU virtualization

- -

Requirements: -

    -
  1. Results of executing non-privileged instructions in privileged and - user mode must be equivalent. (Why? B/c the virtual "privileged" - system will not be running in true "privileged" mode.) -
  2. There must be a way to protect the VM from the real machine. (Some - sort of memory protection/address translation. For fault isolation.)
  3. -
  4. There must be a way to detect and transfer control to the VMM when - the VM tries to execute a sensitive instruction (e.g. a privileged - instruction, or one that could expose the "virtualness" of the - VM.) It must be possible to emulate these instructions in - software. Can be classified into completely virtualizable - (i.e. there are protection mechanisms that cause traps for all - instructions), partly (insufficient or incomplete trap - mechanisms), or not at all (e.g. no MMU). -
-

- -

The MIPS didn't quite meet the second criteria, as discussed -above. But, it does have a supervisor mode that is between user mode and -kernel mode where any privileged instruction will trap.

- -

What might a the VMM trap handler look like?

-
-void privilege_trap_handler (addr) {
-  instruction, args = decode_instruction (addr)
-  switch (instruction) {
-  case foo:
-    emulate_foo (thiscpu, args, ...);
-    break;
-  case bar:
-    emulate_bar (thiscpu, args, ...);
-    break;
-  case ...:
-    ...
-  }
-}
-
-

The emulator_foo bits will have to evaluate the -state of the virtual CPU and compute the appropriate "fake" answer. -

- -

What sort of state is needed in order to appropriately emulate all -of these things? -

-- all user registers
-- CPU specific regs (e.g. on x86, %crN, debugging, FP...)
-- page tables (or tlb)
-- interrupt tables
-
-This is needed for each virtual processor. -

- -

Device I/O virtualization

- -

We intercept all communication to the I/O devices: read/writes to -reserved memory addresses cause page faults into special handlers -which will emulate or pass through I/O as appropriate. -

- -

-In a system like Disco, the sequence would look something like: -

    -
  1. VM executes instruction to access I/O
  2. -
  3. Trap generated by CPU (based on memory or privilege protection) - transfers control to VMM.
  4. -
  5. VMM emulates I/O instruction, saving information about where this - came from (for demultiplexing async reply from hardware later) .
  6. -
  7. VMM reschedules a VM.
  8. -
-

- -

-Interrupts will require some additional work: -

    -
  1. Interrupt occurs on real machine, transfering control to VMM - handler.
  2. -
  3. VMM determines the VM that ought to receive this interrupt.
  4. -
  5. VMM causes a simulated interrupt to occur in the VM, and reschedules a - VM.
  6. -
  7. VM runs its interrupt handler, which may involve other I/O - instructions that need to be trapped.
  8. -
-

- -

-The above can be slow! So sometimes you want the guest operating -system to be aware that it is a guest and allow it to avoid the slow -path. Special device drivers or changing instructions that would cause -traps into memory read/write instructions. -

- -

Intel x86/vmware

- -

VMware, unlike Disco, runs as an application on a guest OS and -cannot modify the guest OS. Furthermore, it must virtualize the x86 -instead of MIPS processor. Both of these differences make good design -challenges. - -

The first challenge is that the monitor runs in user space, yet it -must dispatch traps and it must execute privilege instructions, which -both require kernel privileges. To address this challenge, the -monitor downloads a piece of code, a kernel module, into the guest -OS. Most modern operating systems are constructed as a core kernel, -extended with downloadable kernel modules. -Privileged users can insert kernel modules at run-time. - -

The monitor downloads a kernel module that reads the IDT, copies -it, and overwrites the hard-wired entries with addresses for stubs in -the just downloaded kernel module. When a trap happens, the kernel -module inspects the PC, and either forwards the trap to the monitor -running in user space or to the guest OS. If the trap is caused -because a guest OS execute a privileged instructions, the monitor can -emulate that privilege instruction by asking the kernel module to -perform that instructions (perhaps after modifying the arguments to -the instruction). - -

The second challenge is virtualizing the x86 - instructions. Unfortunately, x86 doesn't meet the 3 requirements for - CPU virtualization. the first two requirements above. If you run - the CPU in ring 3, most x86 instructions will be fine, - because most privileged instructions will result in a trap, which - can then be forwarded to vmware for emulation. For example, - consider a guest OS loading the root of a page table in CR3. This - results in trap (the guest OS runs in user space), which is - forwarded to the monitor, which can emulate the load to CR3 as - follows: - -

-// addr is a physical address
-void emulate_lcr3 (thiscpu, addr)
-{
-  thiscpu->cr3 = addr;
-  Pte *fakepdir = lookup (addr, oldcr3cache);
-  if (!fakepdir) {
-    fakedir = ppage_alloc ();
-    store (oldcr3cache, addr, fakedir);
-    // May wish to scan through supplied page directory to see if
-    // we have to fix up anything in particular.
-    // Exact settings will depend on how we want to handle
-    // problem cases below and our own MM.
-  }
-  asm ("movl fakepdir,%cr3");
-  // Must make sure our page fault handler is in sync with what we do here.
-}
-
- -

To virtualize the x86, the monitor must intercept any modifications -to the page table and substitute appropriate responses. And update -things like the accessed/dirty bits. The monitor can arrange for this -to happen by making all page table pages inaccessible so that it can -emulate loads and stores to page table pages. This setup allow the -monitor to virtualize the memory interface of the x86.

- -

Unfortunately, not all instructions that must be virtualized result -in traps: -

    -
  • pushf/popf: FL_IF is handled different, - for example. In user-mode setting FL_IF is just ignored.
  • -
  • Anything (push, pop, mov) - that reads or writes from %cs, which contains the - privilege level. -
  • Setting the interrupt enable bit in EFLAGS has different -semantics in user space and kernel space. In user space, it -is ignored; in kernel space, the bit is set. -
  • And some others... (total, 17 instructions). -
-These instructions are unpriviliged instructions (i.e., don't cause a -trap when executed by a guest OS) but expose physical processor state. -These could reveal details of virtualization that should not be -revealed. For example, if guest OS sets the interrupt enable bit for -its virtual x86, the virtualized EFLAGS should reflect that the bit is -set, even though the guest OS is running in user space. - -

How can we virtualize these instructions? An approach is to decode -the instruction stream that is provided by the user and look for bad -instructions. When we find them, replace them with an interrupt -(INT 3) that will allow the VMM to handle it -correctly. This might look something like: -

- -
-void initcode () {
-  scan_for_nonvirtual (0x7c00);
-}
-
-void scan_for_nonvirtualizable (thiscpu, startaddr) {
-  addr  = startaddr;
-  instr = disassemble (addr);
-  while (instr is not branch or bad) {
-    addr += len (instr);
-    instr = disassemble (addr);
-  }
-  // remember that we wanted to execute this instruction.
-  replace (addr, "int 3");
-  record (thiscpu->rewrites, addr, instr);
-}
-
-void breakpoint_handler (tf) {
-  oldinstr = lookup (thiscpu->rewrites, tf->eip);
-  if (oldinstr is branch) {
-    newcs:neweip = evaluate branch
-    scan_for_nonvirtualizable (thiscpu, newcs:neweip)
-    return;
-  } else { // something non virtualizable
-    // dispatch to appropriate emulation
-  }
-}
-
-

All pages must be scanned in this way. Fortunately, most pages -probably are okay and don't really need any special handling so after -scanning them once, we can just remember that the page is okay and let -it run natively. -

- -

What if a guest OS generates instructions, writes them to memory, -and then wants to execute them? We must detect self-modifying code -(e.g. must simulate buffer overflow attacks correctly.) When a write -to a physical page that happens to be in code segment happens, must -trap the write and then rescan the affected portions of the page.

- -

What about self-examining code? Need to protect it some -how---possibly by playing tricks with instruction/data TLB caches, or -introducing a private segment for code (%cs) that is different than -the segment used for reads/writes (%ds). -

- -

Some Disco paper notes

- -

-Disco has some I/O specific optimizations. -

-
    -
  • Disk reads only need to happen once and can be shared between - virtual machines via copy-on-write virtual memory tricks.
  • -
  • Network cards do not need to be fully virtualized --- intra - VM communication doesn't need a real network card backing it.
  • -
  • Special handling for NFS so that all VMs "share" a buffer cache.
  • -
- -

-Disco developers clearly had access to IRIX source code. -

-
    -
  • Need to deal with KSEG0 segment of MIPS memory by relinking kernel - at different address space.
  • -
  • Ensuring page-alignment of network writes (for the purposes of - doing memory map tricks.)
  • -
- -

Performance?

-
    -
  • Evaluated in simulation.
  • -
  • Where are the overheads? Where do they come from?
  • -
  • Does it run better than NUMA IRIX?
  • -
- -

Premise. Are virtual machine the preferred approach to extending -operating systems? Have scalable multiprocessors materialized?

- -

Related papers

- -

John Scott Robin, Cynthia E. Irvine. Analysis of the -Intel Pentium's Ability to Support a Secure Virtual Machine -Monitor.

- -

Jeremy Sugerman, Ganesh Venkitachalam, Beng-Hong Lim. Virtualizing -I/O Devices on VMware Workstation's Hosted Virtual Machine -Monitor. In Proceedings of the 2001 Usenix Technical Conference.

- -

Kevin Lawton, Drew Northup. Plex86 Virtual -Machine.

- -

Xen -and the Art of Virtualization, Paul Barham, Boris -Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf -Neugebauer, Ian Pratt, Andrew Warfield, SOSP 2003

- -

A comparison of -software and hardware techniques for x86 virtualizatonKeith Adams -and Ole Agesen, ASPLOS 2006

- - - - - diff --git a/web/l-xfi.html b/web/l-xfi.html deleted file mode 100644 index 41ba434..0000000 --- a/web/l-xfi.html +++ /dev/null @@ -1,246 +0,0 @@ - - -XFI - - - -

XFI

- -

Required reading: XFI: software guards for system address spaces. - -

Introduction

- -

Problem: how to use untrusted code (an "extension") in a trusted -program? -

    -
  • Use untrusted jpeg codec in Web browser -
  • Use an untrusted driver in the kernel -
- -

What are the dangers? -

    -
  • No fault isolations: extension modifies trusted code unintentionally -
  • No protection: extension causes a security hole -
      -
    • Extension has a buffer overrun problem -
    • Extension calls trusted program's functions -
    • Extensions calls a trusted program's functions that is allowed to - call, but supplies "bad" arguments -
    • Extensions calls privileged hardware instructions (when extending - kernel) -
    • Extensions reads data out of trusted program it shouldn't. -
    -
- -

Possible solutions approaches: -

    - -
  • Run extension in its own address space with minimal - privileges. Rely on hardware and operating system protection - mechanism. - -
  • Restrict the language in which the extension is written: -
      - -
    • Packet filter language. Language is limited in its capabilities, - and it easy to guarantee "safe" execution. - -
    • Type-safe language. Language runtime and compiler guarantee "safe" -execution. -
    - -
  • Software-based sandboxing. - -
- -

Software-based sandboxing

- -

Sandboxer. A compiler or binary-rewriter sandboxes all unsafe - instructions in an extension by inserting additional instructions. - For example, every indirect store is preceded by a few instructions - that compute and check the target of the store at runtime. - -

Verifier. When the extension is loaded in the trusted program, the - verifier checks if the extension is appropriately sandboxed (e.g., - are all indirect stores sandboxed? does it call any privileged - instructions?). If not, the extension is rejected. If yes, the - extension is loaded, and can run. If the extension runs, the - instruction that sandbox unsafe instructions check if the unsafe - instruction is used in a safe way. - -

The verifier must be trusted, but the sandboxer doesn't. We can do - without the verifier, if the trusted program can establish that the - extension has been sandboxed by a trusted sandboxer. - -

The paper refers to this setup as instance of proof-carrying code. - -

Software fault isolation

- -

SFI -by Wahbe et al. explored out to use sandboxing for fault isolation -extensions; that is, use sandboxing to control that stores and jump -stay within a specified memory range (i.e., they don't overwrite and -jump into addresses in the trusted program unchecked). They -implemented SFI for a RISC processor, which simplify things since -memory can be written only by store instructions (other instructions -modify registers). In addition, they assumed that there were plenty -of registers, so that they can dedicate a few for sandboxing code. - -

The extension is loaded into a specific range (called a segment) - within the trusted application's address space. The segment is - identified by the upper bits of the addresses in the - segment. Separate code and data segments are necessary to prevent an - extension overwriting its code. - -

An unsafe instruction on the MIPS is an instruction that jumps or - stores to an address that cannot be statically verified to be within - the correct segment. Most control transfer operations, such - program-counter relative can be statically verified. Stores to - static variables often use an immediate addressing mode and can be - statically verified. Indirect jumps and indirect stores are unsafe. - -

To sandbox those instructions the sandboxer could generate the - following code for each unsafe instruction: -

-  DR0 <- target address
-  R0 <- DR0 >> shift-register;  // load in R0 segment id of target
-  CMP R0, segment-register;     // compare to segment id to segment's ID
-  BNE fault-isolation-error     // if not equal, branch to trusted error code
-  STORE using DR0
-
-In this code, DR0, shift-register, and segment register -are dedicated: they cannot be used by the extension code. The -verifier must check if the extension doesn't use they registers. R0 -is a scratch register, but doesn't have to be dedicated. The -dedicated registers are necessary, because otherwise extension could -load DR0 and jump to the STORE instruction directly, skipping the -check. - -

This implementation costs 4 registers, and 4 additional instructions - for each unsafe instruction. One could do better, however: -

-  DR0 <- target address & and-mask-register // mask segment ID from target
-  DR0 <- DR0 | segment register // insert this segment's ID
-  STORE using DR0
-
-This code just sets the write segment ID bits. It doesn't catch -illegal addresses; it just ensures that illegal addresses are within -the segment, harming the extension but no other code. Even if the -extension jumps to the second instruction of this sandbox sequence, -nothing bad will happen (because DR0 will already contain the correct -segment ID). - -

Optimizations include: -

    -
  • use guard zones for store value, offset(reg) -
  • treat SP as dedicated register (sandbox code that initializes it) -
  • etc. -
- -

XFI

- -

XFI extends SFI in several ways: -

    -
  • Handles fault isolation and protection -
  • Uses control-folow integrity (CFI) to get good performance -
  • Doesn't use dedicated registers -
  • Use two stacks (a scoped stack and an allocation stack) and only - allocation stack can be corrupted by buffer-overrun attacks. The - scoped stack cannot via computed memory references. -
  • Uses a binary rewriter. -
  • Works for the x86 -
- -

x86 is challenging, because limited registers and variable length - of instructions. SFI technique won't work with x86 instruction - set. For example if the binary contains: -

-  25 CD 80 00 00   # AND eax, 0x80CD
-
-and an adversary can arrange to jump to the second byte, then the -adversary calls system call on Linux, which has binary the binary -representation CD 80. Thus, XFI must control execution flow. - -

XFI policy goals: -

    -
  • Memory-access constraints (like SFI) -
  • Interface restrictions (extension has fixed entry and exit points) -
  • Scoped-stack integrity (calling stack is well formed) -
  • Simplified instructions semantics (remove dangerous instructions) -
  • System-environment integrity (ensure certain machine model - invariants, such as x86 flags register cannot be modified) -
  • Control-flow integrity: execution must follow a static, expected - control-flow graph. (enter at beginning of basic blocks) -
  • Program-data integrity (certain global variables in extension - cannot be accessed via computed memory addresses) -
- -

The binary rewriter inserts guards to ensure these properties. The - verifier check if the appropriate guards in place. The primary - mechanisms used are: -

    -
  • CFI guards on computed control-flow transfers (see figure 2) -
  • Two stacks -
  • Guards on computer memory accesses (see figure 3) -
  • Module header has a section that contain access permissions for - region -
  • Binary rewriter, which performs intra-procedure analysis, and - generates guards, code for stack use, and verification hints -
  • Verifier checks specific conditions per basic block. hints specify - the verification state for the entry to each basic block, and at - exit of basic block the verifier checks that the final state implies - the verification state at entry to all possible successor basic - blocks. (see figure 4) -
- -

Can XFI protect against the attack discussed in last lecture? -

-  unsigned int j;
-  p=(unsigned char *)s->init_buf->data;
-  j= *(p++);
-  s->session->session_id_length=j;
-  memcpy(s->session->session_id,p,j);
-
-Where will j be located? - -

How about the following one from the paper Beyond stack smashing: - recent advances in exploiting buffer overruns? -

-void f2b(void * arg, size_t len) {
-  char buf[100];
-  long val = ..;
-  long *ptr = ..;
-  extern void (*f)();
-  
-  memcopy(buff, arg, len);
-  *ptr = val;
-  f();
-  ...
-  return;
-}
-
-What code can (*f)() call? Code that the attacker inserted? -Code in libc? - -

How about an attack that use ptr in the above code to - overwrite a method's address in a class's dispatch table with an - address of support function? - -

How about data-only attacks? For example, attacker - overwrites pw_uid in the heap with 0 before the following - code executes (when downloading /etc/passwd and then uploading it with a - modified entry). -

-FILE *getdatasock( ... ) {
-  seteuid(0);
-  setsockeope ( ...);
-  ...
-  seteuid(pw->pw_uid);
-  ...
-}
-
- -

How much does XFI slow down applications? How many more - instructions are executed? (see Tables 1-4) - - diff --git a/web/l1.html b/web/l1.html deleted file mode 100644 index 9865601..0000000 --- a/web/l1.html +++ /dev/null @@ -1,288 +0,0 @@ -L1 - - - - - -

OS overview

- -

Overview

- -
    -
  • Goal of course: - -
      -
    • Understand operating systems in detail by designing and -implementing miminal OS -
    • Hands-on experience with building systems ("Applying 6.033") -
    - -
  • What is an operating system? -
      -
    • a piece of software that turns the hardware into something useful -
    • layered picture: hardware, OS, applications -
    • Three main functions: fault isolate applications, abstract hardware, -manage hardware -
    - -
  • Examples: -
      -
    • OS-X, Windows, Linux, *BSD, ... (desktop, server) -
    • PalmOS Windows/CE (PDA) -
    • Symbian, JavaOS (Cell phones) -
    • VxWorks, pSOS (real-time) -
    • ... -
    - -
  • OS Abstractions -
      -
    • processes: fork, wait, exec, exit, kill, getpid, brk, nice, sleep, -trace -
    • files: open, close, read, write, lseek, stat, sync -
    • directories: mkdir, rmdir, link, unlink, mount, umount -
    • users + security: chown, chmod, getuid, setuid -
    • interprocess communication: signals, pipe -
    • networking: socket, accept, snd, recv, connect -
    • time: gettimeofday -
    • terminal: -
    - -
  • Sample Unix System calls (mostly POSIX) -
      -
    • int read(int fd, void*, int) -
    • int write(int fd, void*, int) -
    • off_t lseek(int fd, off_t, int [012]) -
    • int close(int fd) -
    • int fsync(int fd) -
    • int open(const char*, int flags [, int mode]) -
        -
      • O_RDONLY, O_WRONLY, O_RDWR, O_CREAT -
      -
    • mode_t umask(mode_t cmask) -
    • int mkdir(char *path, mode_t mode); -
    • DIR *opendir(char *dirname) -
    • struct dirent *readdir(DIR *dirp) -
    • int closedir(DIR *dirp) -
    • int chdir(char *path) -
    • int link(char *existing, char *new) -
    • int unlink(char *path) -
    • int rename(const char*, const char*) -
    • int rmdir(char *path) -
    • int stat(char *path, struct stat *buf) -
    • int mknod(char *path, mode_t mode, dev_t dev) -
    • int fork() -
        -
      • returns childPID in parent, 0 in child; only - difference -
      -
    • int getpid() -
    • int waitpid(int pid, int* stat, int opt) -
        -
      • pid==-1: any; opt==0||WNOHANG -
      • returns pid or error -
      -
    • void _exit(int status) -
    • int kill(int pid, int signal) -
    • int sigaction(int sig, struct sigaction *, struct sigaction *) -
    • int sleep (int sec) -
    • int execve(char* prog, char** argv, char** envp) -
    • void *sbrk(int incr) -
    • int dup2(int oldfd, int newfd) -
    • int fcntl(int fd, F_SETFD, int val) -
    • int pipe(int fds[2]) -
        -
      • writes on fds[1] will be read on fds[0] -
      • when last fds[1] closed, read fds[0] retursn EOF -
      • when last fds[0] closed, write fds[1] kills SIGPIPE/fails - EPIPE -
      -
    • int fchown(int fd, uind_t owner, gid_t group) -
    • int fchmod(int fd, mode_t mode) -
    • int socket(int domain, int type, int protocol) -
    • int accept(int socket_fd, struct sockaddr*, int* namelen) -
        -
      • returns new fd -
      -
    • int listen(int fd, int backlog) -
    • int connect(int fd, const struct sockaddr*, int namelen) -
    • void* mmap(void* addr, size_t len, int prot, int flags, int fd, - off_t offset) -
    • int munmap(void* addr, size_t len) -
    • int gettimeofday(struct timeval*) -
    -
- -

See the reference page for links to -the early Unix papers. - -

Class structure

- -
    -
  • Lab: minimal OS for x86 in an exokernel style (50%) -
      -
    • kernel interface: hardware + protection -
    • libOS implements fork, exec, pipe, ... -
    • applications: file system, shell, .. -
    • development environment: gcc, bochs -
    • lab 1 is out -
    - -
  • Lecture structure (20%) -
      -
    • homework -
    • 45min lecture -
    • 45min case study -
    - -
  • Two quizzes (30%) -
      -
    • mid-term -
    • final's exam week -
    - -
- -

Case study: the shell (simplified)

- -
    -
  • interactive command execution and a programming language -
  • Nice example that uses various OS abstractions. See Unix -paper if you are unfamiliar with the shell. -
  • Final lab is a simple shell. -
  • Basic structure: -
    -      
    -       while (1) {
    -	    printf ("$");
    -	    readcommand (command, args);   // parse user input
    -	    if ((pid = fork ()) == 0) {  // child?
    -	       exec (command, args, 0);
    -	    } else if (pid > 0) {   // parent?
    -	       wait (0);   // wait for child to terminate
    -	    } else {
    -	       perror ("Failed to fork\n");
    -            }
    -        }
    -
    -

    The split of creating a process with a new program in fork and exec -is mostly a historical accident. See the assigned paper for today. -

  • Example: -
    -        $ ls
    -
    -
  • why call "wait"? to wait for the child to terminate and collect -its exit status. (if child finishes, child becomes a zombie until -parent calls wait.) -
  • I/O: file descriptors. Child inherits open file descriptors -from parent. By convention: -
      -
    • file descriptor 0 for input (e.g., keyboard). read_command: -
      -     read (1, buf, bufsize)
      -
      -
    • file descriptor 1 for output (e.g., terminal) -
      -     write (1, "hello\n", strlen("hello\n")+1)
      -
      -
    • file descriptor 2 for error (e.g., terminal) -
    -
  • How does the shell implement: -
    -     $ls > tmp1
    -
    -just before exec insert: -
    -    	   close (1);
    -	   fd = open ("tmp1", O_CREAT|O_WRONLY);   // fd will be 1!
    -
    -

    The kernel will return the first free file descriptor, 1 in this case. -

  • How does the shell implement sharing an output file: -
    -     $ls 2> tmp1 > tmp1
    -
    -replace last code with: -
    -
    -	   close (1);
    -	   close (2);
    -	   fd1 = open ("tmp1", O_CREAT|O_WRONLY);   // fd will be 1!
    -	   fd2 = dup (fd1);
    -
    -both file descriptors share offset -
  • how do programs communicate? -
    -        $ sort file.txt | uniq | wc
    -
    -or -
    -	$ sort file.txt > tmp1
    -	$ uniq tmp1 > tmp2
    -	$ wc tmp2
    -	$ rm tmp1 tmp2
    -
    -or -
    -        $ kill -9
    -
    -
  • A pipe is an one-way communication channel. Here is an example -where the parent is the writer and the child is the reader: -
    -
    -	int fdarray[2];
    -	
    -	if (pipe(fdarray) < 0) panic ("error");
    -	if ((pid = fork()) < 0) panic ("error");
    -	else if (pid > 0) {
    -	  close(fdarray[0]);
    -	  write(fdarray[1], "hello world\n", 12);
    -        } else {
    -	  close(fdarray[1]);
    -	  n = read (fdarray[0], buf, MAXBUF);
    -	  write (1, buf, n);
    -        }
    -
    -
  • How does the shell implement pipelines (i.e., cmd 1 | cmd 2 |..)? -We want to arrange that the output of cmd 1 is the input of cmd 2. -The way to achieve this goal is to manipulate stdout and stdin. -
  • The shell creates processes for each command in -the pipeline, hooks up their stdin and stdout correctly. To do it -correct, and waits for the last process of the -pipeline to exit. A sketch of the core modifications to our shell for -setting up a pipe is: -
    	    
    -	    int fdarray[2];
    -
    -  	    if (pipe(fdarray) < 0) panic ("error");
    -	    if ((pid = fork ()) == 0) {  child (left end of pipe)
    -	       close (1);
    -	       tmp = dup (fdarray[1]);   // fdarray[1] is the write end, tmp will be 1
    -	       close (fdarray[0]);       // close read end
    -	       close (fdarray[1]);       // close fdarray[1]
    -	       exec (command1, args1, 0);
    -	    } else if (pid > 0) {        // parent (right end of pipe)
    -	       close (0);
    -	       tmp = dup (fdarray[0]);   // fdarray[0] is the read end, tmp will be 0
    -	       close (fdarray[0]);
    -	       close (fdarray[1]);       // close write end
    -	       exec (command2, args2, 0);
    -	    } else {
    -	       printf ("Unable to fork\n");
    -            }
    -
    -
  • Why close read-end and write-end? multiple reasons: maintain that -every process starts with 3 file descriptors and reading from an empty -pipe blocks reader, while reading from a closed pipe returns end of -file. -
  • How do you background jobs? -
    -        $ compute &
    -
    -
  • How does the shell implement "&", backgrounding? (Don't call wait -immediately). -
  • More details in the shell lecture later in the term. - - - - diff --git a/web/l13.html b/web/l13.html deleted file mode 100644 index af0f405..0000000 --- a/web/l13.html +++ /dev/null @@ -1,245 +0,0 @@ -High-performance File Systems - - - - - -

    High-performance File Systems

    - -

    Required reading: soft updates. - -

    Overview

    - -

    A key problem in designing file systems is how to obtain -performance on file system operations while providing consistency. -With consistency, we mean, that file system invariants are maintained -is on disk. These invariants include that if a file is created, it -appears in its directory, etc. If the file system data structures are -consistent, then it is possible to rebuild the file system to a -correct state after a failure. - -

    To ensure consistency of on-disk file system data structures, - modifications to the file system must respect certain rules: -

      - -
    • Never point to a structure before it is initialized. An inode must -be initialized before a directory entry references it. An block must -be initialized before an inode references it. - -
    • Never reuse a structure before nullifying all pointers to it. An -inode pointer to a disk block must be reset before the file system can -reallocate the disk block. - -
    • Never reset the last point to a live structure before a new -pointer is set. When renaming a file, the file system should not -remove the old name for an inode until after the new name has been -written. -
    -The paper calls these dependencies update dependencies. - -

    xv6 ensures these rules by writing every block synchronously, and - by ordering the writes appropriately. With synchronous, we mean - that a process waits until the current disk write has been - completed before continuing with execution. - -

      - -
    • What happens if power fails after 4776 in mknod1? Did we lose the - inode for ever? No, we have a separate program (called fsck), which - can rebuild the disk structures correctly and can mark the inode on - the free list. - -
    • Does the order of writes in mknod1 matter? Say, what if we wrote - directory entry first and then wrote the allocated inode to disk? - This violates the update rules and it is not a good plan. If a - failure happens after the directory write, then on recovery we have - an directory pointing to an unallocated inode, which now may be - allocated by another process for another file! - -
    • Can we turn the writes (i.e., the ones invoked by iupdate and - wdir) into delayed writes without creating problems? No, because - the cause might write them back to the disk in an incorrect order. - It has no information to decide in what order to write them. - -
    - -

    xv6 is a nice example of the tension between consistency and - performance. To get consistency, xv6 uses synchronous writes, - but these writes are slow, because they perform at the rate of a - seek instead of the rate of the maximum data transfer rate. The - bandwidth to a disk is reasonable high for large transfer (around - 50Mbyte/s), but latency is low, because of the cost of moving the - disk arm(s) (the seek latency is about 10msec). - -

    This tension is an implementation-dependent one. The Unix API - doesn't require that writes are synchronous. Updates don't have to - appear on disk until a sync, fsync, or open with O_SYNC. Thus, in - principle, the UNIX API allows delayed writes, which are good for - performance: -

      -
    • Batch many writes together in a big one, written at the disk data - rate. -
    • Absorp writes to the same block. -
    • Schedule writes to avoid seeks. -
    - -

    Thus the question: how to delay writes and achieve consistency? - The paper provides an answer. - -

    This paper

    - -

    The paper surveys some of the existing techniques and introduces a -new to achieve the goal of performance and consistency. - -

    - -

    Techniques possible: -

      - -
    • Equip system with NVRAM, and put buffer cache in NVRAM. - -
    • Logging. Often used in UNIX file systems for metadata updates. -LFS is an extreme version of this strategy. - -
    • Flusher-enforced ordering. All writes are delayed. This flusher -is aware of dependencies between blocks, but doesn't work because -circular dependencies need to be broken by writing blocks out. - -
    - -

    Soft updates is the solution explored in this paper. It doesn't -require NVRAM, and performs as well as the naive strategy of keep all -dirty block in main memory. Compared to logging, it is unclear if -soft updates is better. The default BSD file systems uses soft - updates, but most Linux file systems use logging. - -

    Soft updates is a sophisticated variant of flusher-enforced -ordering. Instead of maintaining dependencies on the block-level, it -maintains dependencies on file structure level (per inode, per -directory, etc.), reducing circular dependencies. Furthermore, it -breaks any remaining circular dependencies by undo changes before -writing the block and then redoing them to the block after writing. - -

    Pseudocode for create: -

    -create (f) {
    -   allocate inode in block i  (assuming inode is available)
    -   add i to directory data block d  (assuming d has space)
    -   mark d has dependent on i, and create undo/redo record
    -   update directory inode in block di
    -   mark di has dependent on d
    -}
    -
    - -

    Pseudocode for the flusher: -

    -flushblock (b)
    -{
    -  lock b;
    -  for all dependencies that b is relying on
    -    "remove" that dependency by undoing the change to b
    -    mark the dependency as "unrolled"
    -  write b 
    -}
    -
    -write_completed (b) {
    -  remove dependencies that depend on b
    -  reapply "unrolled" dependencies that b depended on
    -  unlock b
    -}
    -
    - -

    Apply flush algorithm to example: -

      -
    • A list of two dependencies: directory->inode, inode->directory. -
    • Lets say syncer picks directory first -
    • Undo directory->inode changes (i.e., unroll ) -
    • Write directory block -
    • Remove met dependencies (i.e., remove inode->directory dependency) -
    • Perform redo operation (i.e., redo ) -
    • Select inode block and write it -
    • Remove met dependencies (i.e., remove directory->inode dependency) -
    • Select directory block (it is dirty again!) -
    • Write it. -
    - -

    An file operation that is important for file-system consistency -is rename. Rename conceptually works as follows: -

    -rename (from, to)
    -   unlink (to);
    -   link (from, to);
    -   unlink (from);
    -
    - -

    Rename it often used by programs to make a new version of a file -the current version. Committing to a new version must happen -atomically. Unfortunately, without a transaction-like support -atomicity is impossible to guarantee, so a typical file systems -provides weaker semantics for rename: if to already exists, an -instance of to will always exist, even if the system should crash in -the middle of the operation. Does the above implementation of rename -guarantee this semantics? (Answer: no). - -

    If rename is implemented as unlink, link, unlink, then it is -difficult to guarantee even the weak semantics. Modern UNIXes provide -rename as a file system call: -

    -   update dir block for to point to from's inode // write block
    -   update dir block for from to free entry // write block
    -
    -

    fsck may need to correct refcounts in the inode if the file -system fails during rename. for example, a crash after the first -write followed by fsck should set refcount to 2, since both from -and to are pointing at the inode. - -

    This semantics is sufficient, however, for an application to ensure -atomicity. Before the call, there is a from and perhaps a to. If the -call is successful, following the call there is only a to. If there -is a crash, there may be both a from and a to, in which case the -caller knows the previous attempt failed, and must retry. The -subtlety is that if you now follow the two links, the "to" name may -link to either the old file or the new file. If it links to the new -file, that means that there was a crash and you just detected that the -rename operation was composite. On the other hand, the retry -procedure can be the same for either case (do the rename again), so it -isn't necessary to discover how it failed. The function follows the -golden rule of recoverability, and it is idempotent, so it lays all -the needed groundwork for use as part of a true atomic action. - -

    With soft updates renames becomes: -

    -rename (from, to) {
    -   i = namei(from);
    -   add "to" directory data block td a reference to inode i
    -   mark td dependent on block i
    -   update directory inode "to" tdi
    -   mark tdi as dependent on td
    -   remove "from" directory data block fd a reference to inode i
    -   mark fd as dependent on tdi
    -   update directory inode in block fdi
    -   mark fdi as dependent on fd
    -}
    -
    -

    No synchronous writes! - -

    What needs to be done on recovery? (Inspect every statement in -rename and see what inconsistencies could exist on the disk; e.g., -refcnt inode could be too high.) None of these inconsitencies require -fixing before the file system can operate; they can be fixed by a -background file system repairer. - -

    Paper discussion

    - -

    Do soft updates perform any useless writes? (A useless write is a -write that will be immediately overwritten.) (Answer: yes.) Fix -syncer to becareful with what block to start. Fix cache replacement -to selecting LRU block with no pendending dependencies. - -

    Can a log-structured file system implement rename better? (Answer: -yes, since it can get the refcnts right). - -

    Discuss all graphs. - - - diff --git a/web/l14.txt b/web/l14.txt deleted file mode 100644 index d121dff..0000000 --- a/web/l14.txt +++ /dev/null @@ -1,247 +0,0 @@ -Why am I lecturing about Multics? - Origin of many ideas in today's OSes - Motivated UNIX design (often in opposition) - Motivated x86 VM design - This lecture is really "how Intel intended x86 segments to be used" - -Multics background - design started in 1965 - very few interactive time-shared systems then: CTSS - design first, then implementation - system stable by 1969 - so pre-dates UNIX, which started in 1969 - ambitious, many years, many programmers, MIT+GE+BTL - -Multics high-level goals - many users on same machine: "time sharing" - perhaps commercial services sharing the machine too - remote terminal access (but no recognizable data networks: wired or phone) - persistent reliable file system - encourage interaction between users - support joint projects that share data &c - control access to data that should not be shared - -Most interesting aspect of design: memory system - idea: eliminate memory / file distinction - file i/o uses LD / ST instructions - no difference between memory and disk files - just jump to start of file to run program - enhances sharing: no more copying files to private memory - this seems like a really neat simplification! - -GE 645 physical memory system - 24-bit phys addresses - 36-bit words - so up to 75 megabytes of physical memory!!! - but no-one could afford more than about a megabyte - -[per-process state] - DBR - DS, SDW (== address space) - KST - stack segment - per-segment linkage segments - -[global state] - segment content pages - per-segment page tables - per-segment branch in directory segment - AST - -645 segments (simplified for now, no paging or rings) - descriptor base register (DBR) holds phy addr of descriptor segment (DS) - DS is an array of segment descriptor words (SDW) - SDW: phys addr, length, r/w/x, present - CPU has pairs of registers: 18 bit offset, 18 bit segment # - five pairs (PC, arguments, base, linkage, stack) - early Multics limited each segment to 2^16 words - thus there are lots of them, intended to correspond to program modules - note: cannot directly address phys mem (18 vs 24) - 645 segments are a lot like the x86! - -645 paging - DBR and SDW actually contain phy addr of 64-entry page table - each page is 1024 words - PTE holds phys addr and present flag - no permission bits, so you really need to use the segments, not like JOS - no per-process page table, only per-segment - so all processes using a segment share its page table and phys storage - makes sense assuming segments tend to be shared - paging environment doesn't change on process switch - -Multics processes - each process has its own DS - Multics switches DBR on context switch - different processes typically have different number for same segment - -how to use segments to unify memory and file system? - don't want to have to use 18-bit seg numbers as file names - we want to write programs using symbolic names - names should be hierarchical (for users) - so users can have directories and sub-directories - and path names - -Multics file system - tree structure, directories and files - each file and directory is a segment - dir seg holds array of "branches" - name, length, ACL, array of block #s, "active" - unique ROOT directory - path names: ROOT > A > B - note there are no inodes, thus no i-numbers - so "real name" for a file is the complete path name - o/s tables have path name where unix would have i-number - presumably makes renaming and removing active files awkward - no hard links - -how does a program refer to a different segment? - inter-segment variables contain symbolic segment name - A$E refers to segment A, variable/function E - what happens when segment B calls function A$E(1, 2, 3)? - -when compiling B: - compiler actually generates *two* segments - one holds B's instructions - one holds B's linkage information - initial linkage entry: - name of segment e.g. "A" - name of symbol e.g. "E" - valid flag - CALL instruction is indirect through entry i of linkage segment - compiler marks entry i invalid - [storage for strings "A" and "E" really in segment B, not linkage seg] - -when a process is executing B: - two segments in DS: B and a *copy* of B's linkage segment - CPU linkage register always points to current segment's linkage segment - call A$E is really call indirect via linkage[i] - faults because linkage[i] is invalid - o/s fault handler - looks up segment name for i ("A") - search path in file system for segment "A" (cwd, library dirs) - if not already in use by some process (branch active flag and AST knows): - allocate page table and pages - read segment A into memory - if not already in use by *this* process (KST knows): - find free SDW j in process DS, make it refer to A's page table - set up r/w/x based on process's user and file ACL - also set up copy of A's linkage segment - search A's symbol table for "E" - linkage[i] := j / address(E) - restart B - now the CALL works via linkage[i] - and subsequent calls are fast - -how does A get the correct linkage register? - the right value cannot be embedded in A, since shared among processes - so CALL actually goes to instructions in A's linkage segment - load current seg# into linkage register, jump into A - one set of these per procedure in A - -all memory / file references work this way - as if pointers were really symbolic names - segment # is really a transparent optimization - linking is "dynamic" - programs contain symbolic references - resolved only as needed -- if/when executed - code is shared among processes - was program data shared? - probably most variables not shared (on stack, in private segments) - maybe a DB would share a data segment, w/ synchronization - file data: - probably one at a time (locks) for read/write - read-only is easy to share - -filesystem / segment implications - programs start slowly due to dynamic linking - creat(), unlink(), &c are outside of this model - store beyond end extends a segment (== appends to a file) - no need for buffer cache! no need to copy into user space! - but no buffer cache => ad-hoc caches e.g. active segment table - when are dirty segments written back to disk? - only in page eviction algorithm, when free pages are low - database careful ordered writes? e.g. log before data blocks? - I don't know, probably separate flush system calls - -how does shell work? - you type a program name - the shell just CALLs that program, as a segment! - dynamic linking finds program segment and any library segments it needs - the program eventually returns, e.g. with RET - all this happened inside the shell process's address space - no fork, no exec - buggy program can crash the shell! e.g. scribble on stack - process creation was too slow to give each program its own process - -how valuable is the sharing provided by segment machinery? - is it critical to users sharing information? - or is it just there to save memory and copying? - -how does the kernel fit into all this? - kernel is a bunch of code modules in segments (in file system) - a process dynamically loads in the kernel segments that it uses - so kernel segments have different numbers in different processes - a little different from separate kernel "program" in JOS or xv6 - kernel shares process's segment# address space - thus easy to interpret seg #s in system call arguments - kernel segment ACLs in file system restrict write - so mapped non-writeable into processes - -how to call the kernel? - very similar to the Intel x86 - 8 rings. users at 4. core kernel at 0. - CPU knows current execution level - SDW has max read/write/execute levels - call gate: lowers ring level, but only at designated entry - stack per ring, incoming call switches stacks - inner ring can always read arguments, write results - problem: checking validity of arguments to system calls - don't want user to trick kernel into reading/writing the wrong segment - you have this problem in JOS too - later Multics CPUs had hardware to check argument references - -are Multics rings a general-purpose protected subsystem facility? - example: protected game implementation - protected so that users cannot cheat - put game's code and data in ring 3 - BUT what if I don't trust the author? - or if i've already put some other subsystem in ring 3? - a ring has full power over itself and outer rings: you must trust - today: user/kernel, server processes and IPC - pro: protection among mutually suspicious subsystems - con: no convenient sharing of address spaces - -UNIX vs Multics - UNIX was less ambitious (e.g. no unified mem/FS) - UNIX hardware was small - just a few programmers, all in the same room - evolved rather than pre-planned - quickly self-hosted, so they got experience earlier - -What did UNIX inherit from MULTICS? - a shell at user level (not built into kernel) - a single hierarchical file system, with subdirectories - controlled sharing of files - written in high level language, self-hosted development - -What did UNIX reject from MULTICS? - files look like memory - instead, unifying idea is file descriptor and read()/write() - memory is a totally separate resource - dynamic linking - instead, static linking at compile time, every binary had copy of libraries - segments and sharing - instead, single linear address space per process, like xv6 - (but shared libraries brought these back, just for efficiency, in 1980s) - Hierarchical rings of protection - simpler user/kernel - for subsystems, setuid, then client/server and IPC - -The most useful sources I found for late-1960s Multics VM: - 1. Bensoussan, Clingen, Daley, "The Multics Virtual Memory: Concepts - and Design," CACM 1972 (segments, paging, naming segments, dynamic - linking). - 2. Daley and Dennis, "Virtual Memory, Processes, and Sharing in Multics," - SOSP 1967 (more details about dynamic linking and CPU). - 3. Graham, "Protection in an Information Processing Utility," - CACM 1968 (brief account of rings and gates). diff --git a/web/l19.txt b/web/l19.txt deleted file mode 100644 index af9d0bb..0000000 --- a/web/l19.txt +++ /dev/null @@ -1,1412 +0,0 @@ --- front -6.828 Shells Lecture - -Hello. - --- intro -Bourne shell - -Simplest shell: run cmd arg arg ... - fork - exec in child - wait in parent - -More functionality: - file redirection: cmd >file - open file as fd 1 in child before exec - -Still more functionality: - pipes: cmd | cmd | cmd ... - create pipe, - run first cmd with pipe on fd 1, - run second cmd with other end of pipe on fd 0 - -More Bourne arcana: - $* - command args - "$@" - unexpanded command args - environment variables - macro substitution - if, while, for - || - && - "foo $x" - 'foo $x' - `cat foo` - --- rc -Rc Shell - - -No reparsing of input (except explicit eval). - -Variables as explicit lists. - -Explicit concatenation. - -Multiple input pipes <{cmd} - pass /dev/fd/4 as file name. - -Syntax more like C, less like Algol. - -diff <{echo hi} <{echo bye} - --- es -Es shell - - -rc++ - -Goal is to override functionality cleanly. - -Rewrite input like cmd | cmd2 as %pipe {cmd} {cmd2}. - -Users can redefine %pipe, etc. - -Need lexical scoping and let to allow new %pipe refer to old %pipe. - -Need garbage collection to collect unreachable code. - -Design principle: - minimal functionality + good defaults - allow users to customize implementations - - emacs, exokernel - --- apps -Applications - -Shell scripts are only as good as the programs they use. - (What good are pipes without cat, grep, sort, wc, etc.?) - -The more the scripts can access, the more powerful they become. - --- acme -Acme, Plan 9 text editor - -Make window system control files available to -everything, including shell. - -Can write shell scripts to script interactions. - -/home/rsc/bin/Slide -/home/rsc/bin/Slide- -/home/rsc/bin/Slide+ - -/usr/local/plan9/bin/adict - -win - --- javascript -JavaScript - -Very powerful - - not because it's a great language - - because it has a great data set - - Google Maps - - Gmail - - Ymail - - etc. - --- greasemonkey -GreaseMonkey - -// ==UserScript== -// @name Google Ring -// @namespace http://swtch.com/greasemonkey/ -// @description Changes Google Logo -// @include http://*.google.*/* -// ==/UserScript== - -(function() { - for(var i=0; i[2=1] | sed 1d | winwrite body - case 2 - dict=$2 - case 3 - dict=$2 - dict -d $dict $3 >[2=1] | winwrite body - } - winctl clean - wineventloop -} - -dict=NONE -if(~ $1 -d){ - shift - dict=$2 - shift -} -if(~ $1 -d*){ - dict=`{echo $1 | sed 's/-d//'} - shift -} -if(~ $1 -*){ - echo 'usage: adict [-d dict] [word...]' >[1=2] - exit usage -} - -switch($#*){ -case 0 - if(~ $dict NONE) - dictwin /adict/ - if not - dictwin /adict/$dict/ $dict -case * - if(~ $dict NONE){ - dict=`{dict -d'?' | 9 sed -n 's/^ ([^\[ ]+).*/\1/p' | sed 1q} - if(~ $#dict 0){ - echo 'no dictionaries present on this system' >[1=2] - exit nodict - } - } - for(i) - dictwin /adict/$dict/$i $dict $i -} - --- /usr/local/plan9/lib/acme.rc -fn newwindow { - winctl=`{9p read acme/new/ctl} - winid=$winctl(1) - winctl noscroll -} - -fn winctl { - echo $* | 9p write acme/acme/$winid/ctl -} - -fn winread { - 9p read acme/acme/$winid/$1 -} - -fn winwrite { - 9p write acme/acme/$winid/$1 -} - -fn windump { - if(! ~ $1 - '') - winctl dumpdir $1 - if(! ~ $2 - '') - winctl dump $2 -} - -fn winname { - winctl name $1 -} - -fn winwriteevent { - echo $1$2$3 $4 | winwrite event -} - -fn windel { - if(~ $1 sure) - winctl delete - if not - winctl del -} - -fn wineventloop { - . <{winread event >[2]/dev/null | acmeevent} -} --- /home/rsc/plan9/rc/bin/fedex -#!/bin/rc - -if(! ~ $#* 1) { - echo usage: fedex 123456789012 >[1=2] - exit usage -} - -rfork e - -fn bgrep{ -pattern=`{echo $1 | sed 's;/;\\&;'} -shift - -@{ echo 'X { -$ -a - -. -} -X ,x/(.+\n)+\n/ g/'$pattern'/p' | -sam -d $* >[2]/dev/null -} -} - -fn awk2 { - awk 'NR%2==1 { a=$0; } - NR%2==0 { b=$0; printf("%-30s %s\n", a, b); } - ' $* -} - -fn awk3 { - awk '{line[NR] = $0} - END{ - i = 4; - while(i < NR){ - what=line[i++]; - when=line[i]; - comment=""; - if(!(when ~ /..\/..\/.... ..:../)){ - # out of sync - printf("%s\n", what); - continue; - } - i++; - if(!(line[i+1] ~ /..\/..\/.... ..:../) && - (i+2 > NR || line[i+2] ~ /..\/..\/.... ..:../)){ - what = what ", " line[i++]; - } - printf("%s %s\n", when, what); - } - }' $* -} - -# hget 'http://www.fedex.com/cgi-bin/track_it?airbill_list='$1'&kurrent_airbill='$1'&language=english&cntry_code=us&state=0' | -hget 'http://www.fedex.com/cgi-bin/tracking?action=track&language=english&cntry_code=us&initial=x&mps=y&tracknumbers='$1 | - htmlfmt >/tmp/fedex.$pid -sed -n '/Tracking number/,/^$/p' /tmp/fedex.$pid | awk2 -echo -sed -n '/Reference number/,/^$/p' /tmp/fedex.$pid | awk2 -echo -sed -n '/Date.time/,/^$/p' /tmp/fedex.$pid | sed 1,4d | fmt -l 4000 | sed 's/ [A-Z][A-Z] /&\n/g' -rm /tmp/fedex.$pid --- /home/rsc/src/webscript/a3 -#!./o.webscript - -load "http://www.ups.com/WebTracking/track?loc=en_US" -find textbox "InquiryNumber1" -input "1z30557w0340175623" -find next checkbox -input "yes" -find prev form -submit -if(find "Delivery Information"){ - find outer table - print -}else if(find "One or more"){ - print -}else{ - print "Unexpected results." - find page - print -} --- /home/rsc/src/webscript/a2 -#load "http://apc-reset/outlets.htm" -load "apc.html" -print -print "\n=============\n" -find "yoshimi" -find outer row -find next select -input "Immediate Reboot" -submit -print --- /usr/local/plan9/acid/port -// portable acid for all architectures - -defn pfl(addr) -{ - print(pcfile(addr), ":", pcline(addr), "\n"); -} - -defn -notestk(addr) -{ - local pc, sp; - complex Ureg addr; - - pc = addr.pc\X; - sp = addr.sp\X; - - print("Note pc:", pc, " sp:", sp, " ", fmt(pc, 'a'), " "); - pfl(pc); - _stk({"PC", pc, "SP", sp, linkreg(addr)}, 1); -} - -defn -notelstk(addr) -{ - local pc, sp; - complex Ureg addr; - - pc = addr.pc\X; - sp = addr.sp\X; - - print("Note pc:", pc, " sp:", sp, " ", fmt(pc, 'a'), " "); - pfl(pc); - _stk({"PC", pc, "SP", sp, linkreg(addr)}, 1); -} - -defn params(param) -{ - while param do { - sym = head param; - print(sym[0], "=", itoa(sym[1], "%#ux")); - param = tail param; - if param then - print (","); - } -} - -stkprefix = ""; -stkignore = {}; -stkend = 0; - -defn locals(l) -{ - local sym; - - while l do { - sym = head l; - print(stkprefix, "\t", sym[0], "=", itoa(sym[1], "%#ux"), "\n"); - l = tail l; - } -} - -defn _stkign(frame) -{ - local file; - - file = pcfile(frame[0]); - s = stkignore; - while s do { - if regexp(head s, file) then - return 1; - s = tail s; - } - return 0; -} - -// print a stack trace -// -// in a run of leading frames in files matched by regexps in stkignore, -// only print the last one. -defn _stk(regs, dolocals) -{ - local stk, frame, pc, fn, done, callerpc, paramlist, locallist; - - stk = strace(regs); - if stkignore then { - while stk && tail stk && _stkign(head tail stk) do - stk = tail stk; - } - - callerpc = 0; - done = 0; - while stk && !done do { - frame = head stk; - stk = tail stk; - fn = frame[0]; - pc = frame[1]; - callerpc = frame[2]; - paramlist = frame[3]; - locallist = frame[4]; - - print(stkprefix, fmt(fn, 'a'), "("); - params(paramlist); - print(")"); - if pc != fn then - print("+", itoa(pc-fn, "%#ux")); - print(" "); - pfl(pc); - if dolocals then - locals(locallist); - if fn == var("threadmain") || fn == var("p9main") then - done=1; - if fn == var("threadstart") || fn == var("scheduler") then - done=1; - if callerpc == 0 then - done=1; - } - if callerpc && !done then { - print(stkprefix, fmt(callerpc, 'a'), " "); - pfl(callerpc); - } -} - -defn findsrc(file) -{ - local lst, src; - - if file[0] == '/' then { - src = file(file); - if src != {} then { - srcfiles = append srcfiles, file; - srctext = append srctext, src; - return src; - } - return {}; - } - - lst = srcpath; - while head lst do { - src = file(head lst+file); - if src != {} then { - srcfiles = append srcfiles, file; - srctext = append srctext, src; - return src; - } - lst = tail lst; - } -} - -defn line(addr) -{ - local src, file; - - file = pcfile(addr); - src = match(file, srcfiles); - - if src >= 0 then - src = srctext[src]; - else - src = findsrc(file); - - if src == {} then { - print("no source for ", file, "\n"); - return {}; - } - line = pcline(addr)-1; - print(file, ":", src[line], "\n"); -} - -defn addsrcdir(dir) -{ - dir = dir+"/"; - - if match(dir, srcpath) >= 0 then { - print("already in srcpath\n"); - return {}; - } - - srcpath = {dir}+srcpath; -} - -defn source() -{ - local l; - - l = srcpath; - while l do { - print(head l, "\n"); - l = tail l; - } - l = srcfiles; - - while l do { - print("\t", head l, "\n"); - l = tail l; - } -} - -defn Bsrc(addr) -{ - local lst; - - lst = srcpath; - file = pcfile(addr); - if file[0] == '/' && access(file) then { - rc("B "+file+":"+itoa(pcline(addr))); - return {}; - } - while head lst do { - name = head lst+file; - if access(name) then { - rc("B "+name+":"+itoa(pcline(addr))); - return {}; - } - lst = tail lst; - } - print("no source for ", file, "\n"); -} - -defn srcline(addr) -{ - local text, cline, line, file, src; - file = pcfile(addr); - src = match(file,srcfiles); - if (src>=0) then - src = srctext[src]; - else - src = findsrc(file); - if (src=={}) then - { - return "(no source)"; - } - return src[pcline(addr)-1]; -} - -defn src(addr) -{ - local src, file, line, cline, text; - - file = pcfile(addr); - src = match(file, srcfiles); - - if src >= 0 then - src = srctext[src]; - else - src = findsrc(file); - - if src == {} then { - print("no source for ", file, "\n"); - return {}; - } - - cline = pcline(addr)-1; - print(file, ":", cline+1, "\n"); - line = cline-5; - loop 0,10 do { - if line >= 0 then { - if line == cline then - print(">"); - else - print(" "); - text = src[line]; - if text == {} then - return {}; - print(line+1, "\t", text, "\n"); - } - line = line+1; - } -} - -defn step() // single step the process -{ - local lst, lpl, addr, bput; - - bput = 0; - if match(*PC, bplist) >= 0 then { // Sitting on a breakpoint - bput = fmt(*PC, bpfmt); - *bput = @bput; - } - - lst = follow(*PC); - - lpl = lst; - while lpl do { // place break points - *(head lpl) = bpinst; - lpl = tail lpl; - } - - startstop(pid); // do the step - - while lst do { // remove the breakpoints - addr = fmt(head lst, bpfmt); - *addr = @addr; - lst = tail lst; - } - if bput != 0 then - *bput = bpinst; -} - -defn bpset(addr) // set a breakpoint -{ - if status(pid) != "Stopped" then { - print("Waiting...\n"); - stop(pid); - } - if match(addr, bplist) >= 0 then - print("breakpoint already set at ", fmt(addr, 'a'), "\n"); - else { - *fmt(addr, bpfmt) = bpinst; - bplist = append bplist, addr; - } -} - -defn bptab() // print a table of breakpoints -{ - local lst, addr; - - lst = bplist; - while lst do { - addr = head lst; - print("\t", fmt(addr, 'X'), " ", fmt(addr, 'a'), " ", fmt(addr, 'i'), "\n"); - lst = tail lst; - } -} - -defn bpdel(addr) // delete a breakpoint -{ - local n, pc, nbplist; - - if addr == 0 then { - while bplist do { - pc = head bplist; - pc = fmt(pc, bpfmt); - *pc = @pc; - bplist = tail bplist; - } - return {}; - } - - n = match(addr, bplist); - if n < 0 then { - print("no breakpoint at ", fmt(addr, 'a'), "\n"); - return {}; - } - - addr = fmt(addr, bpfmt); - *addr = @addr; - - nbplist = {}; // delete from list - while bplist do { - pc = head bplist; - if pc != addr then - nbplist = append nbplist, pc; - bplist = tail bplist; - } - bplist = nbplist; // delete from memory -} - -defn cont() // continue execution -{ - local addr; - - addr = fmt(*PC, bpfmt); - if match(addr, bplist) >= 0 then { // Sitting on a breakpoint - *addr = @addr; - step(); // Step over - *addr = bpinst; - } - startstop(pid); // Run -} - -defn stopped(pid) // called from acid when a process changes state -{ - pfixstop(pid); - pstop(pid); // stub so this is easy to replace -} - -defn procs() // print status of processes -{ - local c, lst, cpid; - - cpid = pid; - lst = proclist; - while lst do { - np = head lst; - setproc(np); - if np == cpid then - c = '>'; - else - c = ' '; - print(fmt(c, 'c'), np, ": ", status(np), " at ", fmt(*PC, 'a'), " setproc(", np, ")\n"); - lst = tail lst; - } - pid = cpid; - if pid != 0 then - setproc(pid); -} - -_asmlines = 30; - -defn asm(addr) -{ - local bound; - - bound = fnbound(addr); - - addr = fmt(addr, 'i'); - loop 1,_asmlines do { - print(fmt(addr, 'a'), " ", fmt(addr, 'X')); - print("\t", @addr++, "\n"); - if bound != {} && addr > bound[1] then { - lasmaddr = addr; - return {}; - } - } - lasmaddr = addr; -} - -defn casm() -{ - asm(lasmaddr); -} - -defn xasm(addr) -{ - local bound; - - bound = fnbound(addr); - - addr = fmt(addr, 'i'); - loop 1,_asmlines do { - print(fmt(addr, 'a'), " ", fmt(addr, 'X')); - print("\t", *addr++, "\n"); - if bound != {} && addr > bound[1] then { - lasmaddr = addr; - return {}; - } - } - lasmaddr = addr; -} - -defn xcasm() -{ - xasm(lasmaddr); -} - -defn win() -{ - local npid, estr; - - bplist = {}; - notes = {}; - - estr = "/sys/lib/acid/window '0 0 600 400' "+textfile; - if progargs != "" then - estr = estr+" "+progargs; - - npid = rc(estr); - npid = atoi(npid); - if npid == 0 then - error("win failed to create process"); - - setproc(npid); - stopped(npid); -} - -defn win2() -{ - local npid, estr; - - bplist = {}; - notes = {}; - - estr = "/sys/lib/acid/transcript '0 0 600 400' '100 100 700 500' "+textfile; - if progargs != "" then - estr = estr+" "+progargs; - - npid = rc(estr); - npid = atoi(npid); - if npid == 0 then - error("win failed to create process"); - - setproc(npid); - stopped(npid); -} - -printstopped = 1; -defn new() -{ - local a; - - bplist = {}; - newproc(progargs); - a = var("p9main"); - if a == {} then - a = var("main"); - if a == {} then - return {}; - bpset(a); - while *PC != a do - cont(); - bpdel(a); -} - -defn stmnt() // step one statement -{ - local line; - - line = pcline(*PC); - while 1 do { - step(); - if line != pcline(*PC) then { - src(*PC); - return {}; - } - } -} - -defn func() // step until we leave the current function -{ - local bound, end, start, pc; - - bound = fnbound(*PC); - if bound == {} then { - print("cannot locate text symbol\n"); - return {}; - } - - pc = *PC; - start = bound[0]; - end = bound[1]; - while pc >= start && pc < end do { - step(); - pc = *PC; - } -} - -defn next() -{ - local sp, bound, pc; - - sp = *SP; - bound = fnbound(*PC); - if bound == {} then { - print("cannot locate text symbol\n"); - return {}; - } - stmnt(); - pc = *PC; - if pc >= bound[0] && pc < bound[1] then - return {}; - - while (pc < bound[0] || pc > bound[1]) && sp >= *SP do { - step(); - pc = *PC; - } - src(*PC); -} - -defn maps() -{ - local m, mm; - - m = map(); - while m != {} do { - mm = head m; - m = tail m; - print(mm[2]\X, " ", mm[3]\X, " ", mm[4]\X, " ", mm[0], " ", mm[1], "\n"); - } -} - -defn dump(addr, n, fmt) -{ - loop 0, n do { - print(fmt(addr, 'X'), ": "); - addr = mem(addr, fmt); - } -} - -defn mem(addr, fmt) -{ - - local i, c, n; - - i = 0; - while fmt[i] != 0 do { - c = fmt[i]; - n = 0; - while '0' <= fmt[i] && fmt[i] <= '9' do { - n = 10*n + fmt[i]-'0'; - i = i+1; - } - if n <= 0 then n = 1; - addr = fmt(addr, fmt[i]); - while n > 0 do { - print(*addr++, " "); - n = n-1; - } - i = i+1; - } - print("\n"); - return addr; -} - -defn symbols(pattern) -{ - local l, s; - - l = symbols; - while l do { - s = head l; - if regexp(pattern, s[0]) then - print(s[0], "\t", s[1], "\t", s[2], "\t", s[3], "\n"); - l = tail l; - } -} - -defn havesymbol(name) -{ - local l, s; - - l = symbols; - while l do { - s = head l; - l = tail l; - if s[0] == name then - return 1; - } - return 0; -} - -defn spsrch(len) -{ - local addr, a, s, e; - - addr = *SP; - s = origin & 0x7fffffff; - e = etext & 0x7fffffff; - loop 1, len do { - a = *addr++; - c = a & 0x7fffffff; - if c > s && c < e then { - print("src(", a, ")\n"); - pfl(a); - } - } -} - -defn acidtypes() -{ - local syms; - local l; - - l = textfile(); - if l != {} then { - syms = "acidtypes"; - while l != {} do { - syms = syms + " " + ((head l)[0]); - l = tail l; - } - includepipe(syms); - } -} - -defn getregs() -{ - local regs, l; - - regs = {}; - l = registers; - while l != {} do { - regs = append regs, var(l[0]); - l = tail l; - } - return regs; -} - -defn setregs(regs) -{ - local l; - - l = registers; - while l != {} do { - var(l[0]) = regs[0]; - l = tail l; - regs = tail regs; - } - return regs; -} - -defn resetregs() -{ - local l; - - l = registers; - while l != {} do { - var(l[0]) = register(l[0]); - l = tail l; - } -} - -defn clearregs() -{ - local l; - - l = registers; - while l != {} do { - var(l[0]) = refconst(~0); - l = tail l; - } -} - -progargs=""; -print(acidfile); - --- /usr/local/plan9/acid/386 -// 386 support - -defn acidinit() // Called after all the init modules are loaded -{ - bplist = {}; - bpfmt = 'b'; - - srcpath = { - "./", - "/sys/src/libc/port/", - "/sys/src/libc/9sys/", - "/sys/src/libc/386/" - }; - - srcfiles = {}; // list of loaded files - srctext = {}; // the text of the files -} - -defn linkreg(addr) -{ - return {}; -} - -defn stk() // trace -{ - _stk({"PC", *PC, "SP", *SP}, 0); -} - -defn lstk() // trace with locals -{ - _stk({"PC", *PC, "SP", *SP}, 1); -} - -defn gpr() // print general(hah hah!) purpose registers -{ - print("AX\t", *AX, " BX\t", *BX, " CX\t", *CX, " DX\t", *DX, "\n"); - print("DI\t", *DI, " SI\t", *SI, " BP\t", *BP, "\n"); -} - -defn spr() // print special processor registers -{ - local pc; - local cause; - - pc = *PC; - print("PC\t", pc, " ", fmt(pc, 'a'), " "); - pfl(pc); - print("SP\t", *SP, " ECODE ", *ECODE, " EFLAG ", *EFLAGS, "\n"); - print("CS\t", *CS, " DS\t ", *DS, " SS\t", *SS, "\n"); - print("GS\t", *GS, " FS\t ", *FS, " ES\t", *ES, "\n"); - - cause = *TRAP; - print("TRAP\t", cause, " ", reason(cause), "\n"); -} - -defn regs() // print all registers -{ - spr(); - gpr(); -} - -defn mmregs() -{ - print("MM0\t", *MM0, " MM1\t", *MM1, "\n"); - print("MM2\t", *MM2, " MM3\t", *MM3, "\n"); - print("MM4\t", *MM4, " MM5\t", *MM5, "\n"); - print("MM6\t", *MM6, " MM7\t", *MM7, "\n"); -} - -defn pfixstop(pid) -{ - if *fmt(*PC-1, 'b') == 0xCC then { - // Linux stops us after the breakpoint, not at it - *PC = *PC-1; - } -} - - -defn pstop(pid) -{ - local l; - local pc; - local why; - - pc = *PC; - - // FIgure out why we stopped. - if *fmt(pc, 'b') == 0xCC then { - why = "breakpoint"; - - // fix up instruction for print; will put back later - *pc = @pc; - } else if *(pc-2\x) == 0x80CD then { - pc = pc-2; - why = "system call"; - } else - why = "stopped"; - - if printstopped then { - print(pid,": ", why, "\t"); - print(fmt(pc, 'a'), "\t", *fmt(pc, 'i'), "\n"); - } - - if why == "breakpoint" then - *fmt(pc, bpfmt) = bpinst; - - if printstopped && notes then { - if notes[0] != "sys: breakpoint" then { - print("Notes pending:\n"); - l = notes; - while l do { - print("\t", head l, "\n"); - l = tail l; - } - } - } -} - -aggr Ureg -{ - 'U' 0 di; - 'U' 4 si; - 'U' 8 bp; - 'U' 12 nsp; - 'U' 16 bx; - 'U' 20 dx; - 'U' 24 cx; - 'U' 28 ax; - 'U' 32 gs; - 'U' 36 fs; - 'U' 40 es; - 'U' 44 ds; - 'U' 48 trap; - 'U' 52 ecode; - 'U' 56 pc; - 'U' 60 cs; - 'U' 64 flags; - { - 'U' 68 usp; - 'U' 68 sp; - }; - 'U' 72 ss; -}; - -defn -Ureg(addr) { - complex Ureg addr; - print(" di ", addr.di, "\n"); - print(" si ", addr.si, "\n"); - print(" bp ", addr.bp, "\n"); - print(" nsp ", addr.nsp, "\n"); - print(" bx ", addr.bx, "\n"); - print(" dx ", addr.dx, "\n"); - print(" cx ", addr.cx, "\n"); - print(" ax ", addr.ax, "\n"); - print(" gs ", addr.gs, "\n"); - print(" fs ", addr.fs, "\n"); - print(" es ", addr.es, "\n"); - print(" ds ", addr.ds, "\n"); - print(" trap ", addr.trap, "\n"); - print(" ecode ", addr.ecode, "\n"); - print(" pc ", addr.pc, "\n"); - print(" cs ", addr.cs, "\n"); - print(" flags ", addr.flags, "\n"); - print(" sp ", addr.sp, "\n"); - print(" ss ", addr.ss, "\n"); -}; -sizeofUreg = 76; - -aggr Linkdebug -{ - 'X' 0 version; - 'X' 4 map; -}; - -aggr Linkmap -{ - 'X' 0 addr; - 'X' 4 name; - 'X' 8 dynsect; - 'X' 12 next; - 'X' 16 prev; -}; - -defn -linkdebug() -{ - local a; - - if !havesymbol("_DYNAMIC") then - return 0; - - a = _DYNAMIC; - while *a != 0 do { - if *a == 21 then // 21 == DT_DEBUG - return *(a+4); - a = a+8; - } - return 0; -} - -defn -dynamicmap() -{ - if systype == "linux" || systype == "freebsd" then { - local r, m, n; - - r = linkdebug(); - if r then { - complex Linkdebug r; - m = r.map; - n = 0; - while m != 0 && n < 100 do { - complex Linkmap m; - if m.name && *(m.name\b) && access(*(m.name\s)) then - print("textfile({\"", *(m.name\s), "\", ", m.addr\X, "});\n"); - m = m.next; - n = n+1; - } - } - } -} - -defn -acidmap() -{ -// dynamicmap(); - acidtypes(); -} - -print(acidfile); diff --git a/web/l2.html b/web/l2.html deleted file mode 100644 index e183d5a..0000000 --- a/web/l2.html +++ /dev/null @@ -1,494 +0,0 @@ - - -L2 - - - -

    6.828 Lecture Notes: x86 and PC architecture

    - -

    Outline

    -
      -
    • PC architecture -
    • x86 instruction set -
    • gcc calling conventions -
    • PC emulation -
    - -

    PC architecture

    - -
      -
    • A full PC has: -
        -
      • an x86 CPU with registers, execution unit, and memory management -
      • CPU chip pins include address and data signals -
      • memory -
      • disk -
      • keyboard -
      • display -
      • other resources: BIOS ROM, clock, ... -
      - -
    • We will start with the original 16-bit 8086 CPU (1978) -
    • CPU runs instructions: -
      -for(;;){
      -	run next instruction
      -}
      -
      - -
    • Needs work space: registers -
        -
      • four 16-bit data registers: AX, CX, DX, BX -
      • each in two 8-bit halves, e.g. AH and AL -
      • very fast, very few -
      -
    • More work space: memory -
        -
      • CPU sends out address on address lines (wires, one bit per wire) -
      • Data comes back on data lines -
      • or data is written to data lines -
      - -
    • Add address registers: pointers into memory -
        -
      • SP - stack pointer -
      • BP - frame base pointer -
      • SI - source index -
      • DI - destination index -
      - -
    • Instructions are in memory too! -
        -
      • IP - instruction pointer (PC on PDP-11, everything else) -
      • increment after running each instruction -
      • can be modified by CALL, RET, JMP, conditional jumps -
      - -
    • Want conditional jumps -
        -
      • FLAGS - various condition codes -
          -
        • whether last arithmetic operation overflowed -
        • ... was positive/negative -
        • ... was [not] zero -
        • ... carry/borrow on add/subtract -
        • ... overflow -
        • ... etc. -
        • whether interrupts are enabled -
        • direction of data copy instructions -
        -
      • JP, JN, J[N]Z, J[N]C, J[N]O ... -
      - -
    • Still not interesting - need I/O to interact with outside world -
        -
      • Original PC architecture: use dedicated I/O space -
          -
        • Works same as memory accesses but set I/O signal -
        • Only 1024 I/O addresses -
        • Example: write a byte to line printer: -
          -#define DATA_PORT    0x378
          -#define STATUS_PORT  0x379
          -#define   BUSY 0x80
          -#define CONTROL_PORT 0x37A
          -#define   STROBE 0x01
          -void
          -lpt_putc(int c)
          -{
          -  /* wait for printer to consume previous byte */
          -  while((inb(STATUS_PORT) & BUSY) == 0)
          -    ;
          -
          -  /* put the byte on the parallel lines */
          -  outb(DATA_PORT, c);
          -
          -  /* tell the printer to look at the data */
          -  outb(CONTROL_PORT, STROBE);
          -  outb(CONTROL_PORT, 0);
          -}
          -
          -		
        - -
      • Memory-Mapped I/O -
          -
        • Use normal physical memory addresses -
            -
          • Gets around limited size of I/O address space -
          • No need for special instructions -
          • System controller routes to appropriate device -
          -
        • Works like ``magic'' memory: -
            -
          • Addressed and accessed like memory, - but ... -
          • ... does not behave like memory! -
          • Reads and writes can have ``side effects'' -
          • Read results can change due to external events -
          -
        -
      - - -
    • What if we want to use more than 2^16 bytes of memory? -
        -
      • 8086 has 20-bit physical addresses, can have 1 Meg RAM -
      • each segment is a 2^16 byte window into physical memory -
      • virtual to physical translation: pa = va + seg*16 -
      • the segment is usually implicit, from a segment register -
      • CS - code segment (for fetches via IP) -
      • SS - stack segment (for load/store via SP and BP) -
      • DS - data segment (for load/store via other registers) -
      • ES - another data segment (destination for string operations) -
      • tricky: can't use the 16-bit address of a stack variable as a pointer -
      • but a far pointer includes full segment:offset (16 + 16 bits) -
      - -
    • But 8086's 16-bit addresses and data were still painfully small -
        -
      • 80386 added support for 32-bit data and addresses (1985) -
      • boots in 16-bit mode, boot.S switches to 32-bit mode -
      • registers are 32 bits wide, called EAX rather than AX -
      • operands and addresses are also 32 bits, e.g. ADD does 32-bit arithmetic -
      • prefix 0x66 gets you 16-bit mode: MOVW is really 0x66 MOVW -
      • the .code32 in boot.S tells assembler to generate 0x66 for e.g. MOVW -
      • 80386 also changed segments and added paged memory... -
      - -
    - -

    x86 Physical Memory Map

    - -
      -
    • The physical address space mostly looks like ordinary RAM -
    • Except some low-memory addresses actually refer to other things -
    • Writes to VGA memory appear on the screen -
    • Reset or power-on jumps to ROM at 0x000ffff0 -
    - -
    -+------------------+  <- 0xFFFFFFFF (4GB)
    -|      32-bit      |
    -|  memory mapped   |
    -|     devices      |
    -|                  |
    -/\/\/\/\/\/\/\/\/\/\
    -
    -/\/\/\/\/\/\/\/\/\/\
    -|                  |
    -|      Unused      |
    -|                  |
    -+------------------+  <- depends on amount of RAM
    -|                  |
    -|                  |
    -| Extended Memory  |
    -|                  |
    -|                  |
    -+------------------+  <- 0x00100000 (1MB)
    -|     BIOS ROM     |
    -+------------------+  <- 0x000F0000 (960KB)
    -|  16-bit devices, |
    -|  expansion ROMs  |
    -+------------------+  <- 0x000C0000 (768KB)
    -|   VGA Display    |
    -+------------------+  <- 0x000A0000 (640KB)
    -|                  |
    -|    Low Memory    |
    -|                  |
    -+------------------+  <- 0x00000000
    -
    - -

    x86 Instruction Set

    - -
      -
    • Two-operand instruction set -
        -
      • Intel syntax: op dst, src -
      • AT&T (gcc/gas) syntax: op src, dst -
          -
        • uses b, w, l suffix on instructions to specify size of operands -
        -
      • Operands are registers, constant, memory via register, memory via constant -
      • Examples: -
-
AT&T syntax "C"-ish equivalent -
movl %eax, %edx edx = eax; register mode -
movl $0x123, %edx edx = 0x123; immediate -
movl 0x123, %edx edx = *(int32_t*)0x123; direct -
movl (%ebx), %edx edx = *(int32_t*)ebx; indirect -
movl 4(%ebx), %edx edx = *(int32_t*)(ebx+4); displaced -
- - -

  • Instruction classes - - -
  • Intel architecture manual Volume 2 is the reference - - - -

    gcc x86 calling conventions

    - - - - -

    PC emulation

    - - diff --git a/web/l3.html b/web/l3.html deleted file mode 100644 index 7d6ca0d..0000000 --- a/web/l3.html +++ /dev/null @@ -1,334 +0,0 @@ -L3 - - - - - -

    Operating system organizaton

    - -

    Required reading: Exokernel paper. - -

    Intro: virtualizing

    - -

    One way to think about an operating system interface is that it -extends the hardware instructions with a set of "instructions" that -are implemented in software. These instructions are invoked using a -system call instruction (int on the x86). In this view, a task of the -operating system is to provide each application with a virtual -version of the interface; that is, it provides each application with a -virtual computer. - -

    One of the challenges in an operating system is multiplexing the -physical resources between the potentially many virtual computers. -What makes the multiplexing typically complicated is an additional -constraint: isolate the virtual computers well from each other. That -is, -

    - -

    In this lecture, we will explore at a high-level how to build -virtual computer that meet these goals. In the rest of the term we -work out the details. - -

    Virtual processors

    - -

    To give each application its own set of virtual processor, we need -to virtualize the physical processors. One way to do is to multiplex -the physical processor over time: the operating system runs one -application for a while, then runs another application for while, etc. -We can implement this solution as follows: when an application has run -for its share of the processor, unload the state of the phyical -processor, save that state to be able to resume the application later, -load in the state for the next application, and resume it. - -

    What needs to be saved and restored? That depends on the -processor, but for the x86: -

    - -

    To enforce that a virtual processor doesn't keep a processor, the -operating system can arrange for a periodic interrupt, and switch the -processor in the interrupt routine. - -

    To separate the memories of the applications, we may also need to save -and restore the registers that define the (virtual) memory of the -application (e.g., segment and MMU registers on the x86), which is -explained next. - - - -

    Separating memories

    - -

    Approach to separating memories: -

    -The approaches can be combined. - -

    Lets assume unlimited physical memory for a little while. We can -enforce separation then as follows: -

    -Why does this work? load/stores/jmps cannot touch/enter other -application's domains. - -

    To allow for controled sharing and separation with an application, -extend domain registers with protectioin bits: read (R), write (W), -execute-only (X). - -

    How to protect the domain registers? Extend the protection bits -with a kernel-only one. When in kernel-mode, processor can change -domain registers. As we will see in lecture 4, x86 stores the U/K -information in CPL (current privilege level) in CS segment -register. - -

    To change from user to kernel, extend the hardware with special -instructions for entering a "supervisor" or "system" call, and -returning from it. On x86, int and reti. The int instruction takes as -argument the system call number. We can then think of the kernel -interface as the set of "instructions" that augment the instructions -implemented in hardware. - -

    Memory management

    - -

    We assumed unlimited physical memory and big addresses. In -practice, operating system must support creating, shrinking, and -growing of domains, while still allowing the addresses of an -application to be contiguous (for programming convenience). What if -we want to grow the domain of application 1 but the memory right below -and above it is in use by application 2? - -

    How? Virtual addresses and spaces. Virtualize addresses and let -the kernel control the mapping from virtual to physical. - -

    Address spaces provide each application with the ideas that it has -a complete memory for itself. All the addresses it issues are its -addresses (e.g., each application has an address 0). - -

  • How do you give each application its own address space? - - -
  • What if two applications want to share real memory? Map the pages -into multiple address spaces and have protection bits per page. - -
  • How do you give an application access to a memory-mapped-IO -device? Map the physical address for the device into the applications -address space. - -
  • How do you get off the ground? - - -

    Operating system organizations

    - -

    A central theme in operating system design is how to organize the -operating system. It is helpful to define a couple of terms: -

    - -

    Example: trace a call to printf made by an application. - -

    There are roughly 4 operating system designs: -

    - -

    Although monolithic operating systems are the dominant operating -system architecture for desktop and server machines, it is worthwhile -to consider alternative architectures, even it is just to understand -operating systems better. This lecture looks at exokernels, because -that is what you will building in the lab. xv6 is organized as a -monolithic system, and we will study in the next lectures. Later in -the term we will read papers about microkernel and virtual machine -operating systems. - -

    Exokernels

    - -

    The exokernel architecture takes an end-to-end approach to -operating system design. In this design, the kernel just securely -multiplexes physical resources; any programmer can decide what the -operating system interface and its implementation are for his -application. One would expect a couple of popular APIs (e.g., UNIX) -that most applications will link against, but a programmer is always -free to replace that API, partially or completely. (Draw picture of -JOS.) - -

    Compare UNIX interface (v6 or OSX) with the JOS exokernel-like interface: -

    -enum
    -{
    -	SYS_cputs = 0,
    -	SYS_cgetc,
    -	SYS_getenvid,
    -	SYS_env_destroy,
    -	SYS_page_alloc,
    -	SYS_page_map,
    -	SYS_page_unmap,
    -	SYS_exofork,
    -	SYS_env_set_status,
    -	SYS_env_set_trapframe,
    -	SYS_env_set_pgfault_upcall,
    -	SYS_yield,
    -	SYS_ipc_try_send,
    -	SYS_ipc_recv,
    -};
    -
    - -

    To illustrate the differences between these interfaces in more -detail consider implementing the following: -

    - -

    How well can each kernel interface implement the above examples? -(Start with UNIX interface and see where you run into problems.) (The -JOS kernel interface is not flexible enough: for example, -ipc_receive is blocking.) - -

    Exokernel paper discussion

    - - -

    The central challenge in an exokernel design it to provide -extensibility, but provide fault isolation. This challenge breaks -down into three problems: - -

    - - - - diff --git a/web/l4.html b/web/l4.html deleted file mode 100644 index 342af32..0000000 --- a/web/l4.html +++ /dev/null @@ -1,518 +0,0 @@ -L4 - - - - - -

    Address translation and sharing using segments

    - -

    This lecture is about virtual memory, focusing on address -spaces. It is the first lecture out of series of lectures that uses -xv6 as a case study. - -

    Address spaces

    - - - -

    Two main approaches to implementing address spaces: using segments - and using page tables. Often when one uses segments, one also uses - page tables. But not the other way around; i.e., paging without - segmentation is common. - -

    Example support for address spaces: x86

    - -

    For an operating system to provide address spaces and address -translation typically requires support from hardware. The translation -and checking of permissions typically must happen on each address used -by a program, and it would be too slow to check that in software (if -even possible). The division of labor is operating system manages -address spaces, and hardware translates addresses and checks -permissions. - -

    PC block diagram without virtual memory support: -

    - -

    The x86 starts out in real mode and translation is as follows: -

    - -

    The operating system can switch the x86 to protected mode, which -allows the operating system to create address spaces. Translation in -protected mode is as follows: -

    - -

    Next lecture covers paging; now we focus on segmentation. - -

    Protected-mode segmentation works as follows: -

    - -

    Case study (xv6)

    - -

    xv6 is a reimplementation of Unix 6th edition. -

    - -

    Newer Unixs have inherited many of the conceptual ideas even though -they added paging, networking, graphics, improve performance, etc. - -

    You will need to read most of the source code multiple times. Your -goal is to explain every line to yourself. - -

    Overview of address spaces in xv6

    - -

    In today's lecture we see how xv6 creates the kernel address - spaces, first user address spaces, and switches to it. To understand - how this happens, we need to understand in detail the state on the - stack too---this may be surprising, but a thread of control and - address space are tightly bundled in xv6, in a concept - called process. The kernel address space is the only address - space with multiple threads of control. We will study context - switching and process management in detail next weeks; creation of - the first user process (init) will get you a first flavor. - -

    xv6 uses only the segmentation hardware on xv6, but in a limited - way. (In JOS you will use page-table hardware too, which we cover in - next lecture.) The adddress space layouts are as follows: -

    - -

    xv6 makes minimal use of the segmentation hardware available on the -x86. What other plans could you envision? - -

    In xv6, each each program has a user and a kernel stack; when the -user program switches to the kernel, it switches to its kernel stack. -Its kernel stack is stored in process's proc structure. (This is -arranged through the descriptors in the IDT, which is covered later.) - -

    xv6 assumes that there is a lot of physical memory. It assumes that - segments can be stored contiguously in physical memory and has - therefore no need for page tables. - -

    xv6 kernel address space

    - -

    Let's see how xv6 creates the kernel address space by tracing xv6 - from when it boots, focussing on address space management: -

    - -

    xv6 user address spaces

    - - - -

    Managing physical memory

    - -

    To create an address space we must allocate physical memory, which - will be freed when an address space is deleted (e.g., when a user - program terminates). xv6 implements a first-fit memory allocater - (see kalloc.c). - -

    It maintains a list of ranges of free memory. The allocator finds - the first range that is larger than the amount of requested memory. - It splits that range in two: one range of the size requested and one - of the remainder. It returns the first range. When memory is - freed, kfree will merge ranges that are adjacent in memory. - -

    Under what scenarios is a first-fit memory allocator undesirable? - -

    Growing an address space

    - -

    How can a user process grow its address space? growproc. -

    -

    We could do a lot better if segments didn't have to contiguous in - physical memory. How could we arrange that? Using page tables, which - is our next topic. This is one place where page tables would be - useful, but there are others too (e.g., in fork). - - - diff --git a/web/l5.html b/web/l5.html deleted file mode 100644 index 61b55e4..0000000 --- a/web/l5.html +++ /dev/null @@ -1,210 +0,0 @@ -Lecture 5/title> -<html> -<head> -</head> -<body> - -<h2>Address translation and sharing using page tables</h2> - -<p> Reading: <a href="../readings/i386/toc.htm">80386</a> chapters 5 and 6<br> - -<p> Handout: <b> x86 address translation diagram</b> - -<a href="x86_translation.ps">PS</a> - -<a href="x86_translation.eps">EPS</a> - -<a href="x86_translation.fig">xfig</a> -<br> - -<p>Why do we care about x86 address translation? -<ul> -<li>It can simplify s/w structure by placing data at fixed known addresses. -<li>It can implement tricks like demand paging and copy-on-write. -<li>It can isolate programs to contain bugs. -<li>It can isolate programs to increase security. -<li>JOS uses paging a lot, and segments more than you might think. -</ul> - -<p>Why aren't protected-mode segments enough? -<ul> -<li>Why did the 386 add translation using page tables as well? -<li>Isn't it enough to give each process its own segments? -</ul> - -<p>Translation using page tables on x86: -<ul> -<li>paging hardware maps linear address (la) to physical address (pa) -<li>(we will often interchange "linear" and "virtual") -<li>page size is 4096 bytes, so there are 1,048,576 pages in 2^32 -<li>why not just have a big array with each page #'s translation? -<ul> -<li>table[20-bit linear page #] => 20-bit phys page # -</ul> -<li>386 uses 2-level mapping structure -<li>one page directory page, with 1024 page directory entries (PDEs) -<li>up to 1024 page table pages, each with 1024 page table entries (PTEs) -<li>so la has 10 bits of directory index, 10 bits table index, 12 bits offset -<li>What's in a PDE or PTE? -<ul> -<li>20-bit phys page number, present, read/write, user/supervisor -</ul> -<li>cr3 register holds physical address of current page directory -<li>puzzle: what do PDE read/write and user/supervisor flags mean? -<li>puzzle: can supervisor read/write user pages? - -<li>Here's how the MMU translates an la to a pa: - - <pre> - uint - translate (uint la, bool user, bool write) - { - uint pde; - pde = read_mem (%CR3 + 4*(la >> 22)); - access (pde, user, read); - pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff)); - access (pte, user, read); - return (pte & 0xfffff000) + (la & 0xfff); - } - - // check protection. pxe is a pte or pde. - // user is true if CPL==3 - void - access (uint pxe, bool user, bool write) - { - if (!(pxe & PG_P) - => page fault -- page not present - if (!(pxe & PG_U) && user) - => page fault -- not access for user - - if (write && !(pxe & PG_W)) - if (user) - => page fault -- not writable - else if (!(pxe & PG_U)) - => page fault -- not writable - else if (%CR0 & CR0_WP) - => page fault -- not writable - } - </pre> - -<li>CPU's TLB caches vpn => ppn mappings -<li>if you change a PDE or PTE, you must flush the TLB! -<ul> - <li>by re-loading cr3 -</ul> -<li>turn on paging by setting CR0_PE bit of %cr0 -</ul> - -Can we use paging to limit what memory an app can read/write? -<ul> -<li>user can't modify cr3 (requires privilege) -<li>is that enough? -<li>could user modify page tables? after all, they are in memory. -</ul> - -<p>How we will use paging (and segments) in JOS: -<ul> -<li>use segments only to switch privilege level into/out of kernel -<li>use paging to structure process address space -<li>use paging to limit process memory access to its own address space -<li>below is the JOS virtual memory map -<li>why map both kernel and current process? why not 4GB for each? -<li>why is the kernel at the top? -<li>why map all of phys mem at the top? i.e. why multiple mappings? -<li>why map page table a second time at VPT? -<li>why map page table a third time at UVPT? -<li>how do we switch mappings for a different process? -</ul> - -<pre> - 4 Gig --------> +------------------------------+ - | | RW/-- - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - : . : - : . : - : . : - |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/-- - | | RW/-- - | Remapped Physical Memory | RW/-- - | | RW/-- - KERNBASE -----> +------------------------------+ 0xf0000000 - | Cur. Page Table (Kern. RW) | RW/-- PTSIZE - VPT,KSTACKTOP--> +------------------------------+ 0xefc00000 --+ - | Kernel Stack | RW/-- KSTKSIZE | - | - - - - - - - - - - - - - - -| PTSIZE - | Invalid Memory | --/-- | - ULIM ------> +------------------------------+ 0xef800000 --+ - | Cur. Page Table (User R-) | R-/R- PTSIZE - UVPT ----> +------------------------------+ 0xef400000 - | RO PAGES | R-/R- PTSIZE - UPAGES ----> +------------------------------+ 0xef000000 - | RO ENVS | R-/R- PTSIZE - UTOP,UENVS ------> +------------------------------+ 0xeec00000 - UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE - +------------------------------+ 0xeebff000 - | Empty Memory | --/-- PGSIZE - USTACKTOP ---> +------------------------------+ 0xeebfe000 - | Normal User Stack | RW/RW PGSIZE - +------------------------------+ 0xeebfd000 - | | - | | - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - . . - . . - . . - |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| - | Program Data & Heap | - UTEXT --------> +------------------------------+ 0x00800000 - PFTEMP -------> | Empty Memory | PTSIZE - | | - UTEMP --------> +------------------------------+ 0x00400000 - | Empty Memory | PTSIZE - 0 ------------> +------------------------------+ -</pre> - -<h3>The VPT </h3> - -<p>Remember how the X86 translates virtual addresses into physical ones: - -<p><img src=pagetables.png> - -<p>CR3 points at the page directory. The PDX part of the address -indexes into the page directory to give you a page table. The -PTX part indexes into the page table to give you a page, and then -you add the low bits in. - -<p>But the processor has no concept of page directories, page tables, -and pages being anything other than plain memory. So there's nothing -that says a particular page in memory can't serve as two or three of -these at once. The processor just follows pointers: - -pd = lcr3(); -pt = *(pd+4*PDX); -page = *(pt+4*PTX); - -<p>Diagramatically, it starts at CR3, follows three arrows, and then stops. - -<p>If we put a pointer into the page directory that points back to itself at -index Z, as in - -<p><img src=vpt.png> - -<p>then when we try to translate a virtual address with PDX and PTX -equal to V, following three arrows leaves us at the page directory. -So that virtual page translates to the page holding the page directory. -In Jos, V is 0x3BD, so the virtual address of the VPD is -(0x3BD<<22)|(0x3BD<<12). - - -<p>Now, if we try to translate a virtual address with PDX = V but an -arbitrary PTX != V, then following three arrows from CR3 ends -one level up from usual (instead of two as in the last case), -which is to say in the page tables. So the set of virtual pages -with PDX=V form a 4MB region whose page contents, as far -as the processor is concerned, are the page tables themselves. -In Jos, V is 0x3BD so the virtual address of the VPT is (0x3BD<<22). - -<p>So because of the "no-op" arrow we've cleverly inserted into -the page directory, we've mapped the pages being used as -the page directory and page table (which are normally virtually -invisible) into the virtual address space. - - -</body> diff --git a/web/os-lab-1.pdf b/web/os-lab-1.pdf deleted file mode 100644 index 80fc3c4..0000000 Binary files a/web/os-lab-1.pdf and /dev/null differ diff --git a/web/os-lab-1.ppt b/web/os-lab-1.ppt deleted file mode 100644 index 42e532a..0000000 Binary files a/web/os-lab-1.ppt and /dev/null differ diff --git a/web/os-lab-2.pdf b/web/os-lab-2.pdf deleted file mode 100644 index 35ad709..0000000 Binary files a/web/os-lab-2.pdf and /dev/null differ diff --git a/web/os-lab-2.ppt b/web/os-lab-2.ppt deleted file mode 100644 index fb03327..0000000 Binary files a/web/os-lab-2.ppt and /dev/null differ diff --git a/web/os-lab-3.pdf b/web/os-lab-3.pdf deleted file mode 100644 index 33e6997..0000000 Binary files a/web/os-lab-3.pdf and /dev/null differ diff --git a/web/os-lab-3.ppt b/web/os-lab-3.ppt deleted file mode 100644 index 3d45ee2..0000000 Binary files a/web/os-lab-3.ppt and /dev/null differ diff --git a/web/x86-intr.html b/web/x86-intr.html deleted file mode 100644 index 0369e25..0000000 --- a/web/x86-intr.html +++ /dev/null @@ -1,53 +0,0 @@ -<title>Homework: xv6 and Interrupts and Exceptions - - - - - -

    Homework: xv6 and Interrupts and Exceptions

    - -

    -Read: xv6's trapasm.S, trap.c, syscall.c, vectors.S, and usys.S. Skim -lapic.c, ioapic.c, and picirq.c - -

    -Hand-In Procedure -

    -You are to turn in this homework during lecture. Please -write up your answers to the exercises below and hand them in to a -6.828 staff member at the beginning of the lecture. -

    - -Introduction - -

    Try to understand -xv6's trapasm.S, trap.c, syscall.c, vectors.S, and usys.S. Skim - You will need to consult: - -

    Chapter 5 of IA-32 Intel -Architecture Software Developer's Manual, Volume 3: System programming -guide; you can skip sections 5.7.1, 5.8.2, and 5.12.2. Be aware -that terms such as exceptions, traps, interrupts, faults and aborts -have no standard meaning. - -

    Chapter 9 of the 1987 i386 -Programmer's Reference Manual also covers exception and interrupt -handling in IA32 processors. - -

    Assignment: - -In xv6, set a breakpoint at the beginning of syscall() to -catch the very first system call. What values are on the stack at -this point? Turn in the output of print-stack 35 at that -breakpoint with each value labeled as to what it is (e.g., -saved %ebp for trap, -trapframe.eip, etc.). -

    -This completes the homework. - - - - - - - diff --git a/web/x86-intro.html b/web/x86-intro.html deleted file mode 100644 index 323d92e..0000000 --- a/web/x86-intro.html +++ /dev/null @@ -1,18 +0,0 @@ -Homework: Intro to x86 and PC - - - - - -

    Homework: Intro to x86 and PC

    - -

    Today's lecture is an introduction to the x86 and the PC, the -platform for which you will write an operating system. The assigned -book is a reference for x86 assembly programming of which you will do -some. - -

    Assignment Make sure to do exercise 1 of lab 1 before -coming to lecture. - - - diff --git a/web/x86-mmu.html b/web/x86-mmu.html deleted file mode 100644 index a83ff26..0000000 --- a/web/x86-mmu.html +++ /dev/null @@ -1,33 +0,0 @@ -Homework: x86 MMU - - - - - -

    Homework: x86 MMU

    - -

    Read chapters 5 and 6 of -Intel 80386 Reference Manual. -These chapters explain -the x86 Memory Management Unit (MMU), -which we will cover in lecture today and which you need -to understand in order to do lab 2. - -

    -Read: bootasm.S and setupsegs() in proc.c - -

    -Hand-In Procedure -

    -You are to turn in this homework during lecture. Please -write up your answers to the exercises below and hand them in to a -6.828 staff member by the beginning of lecture. -

    - -

    Assignment: Try to understand setupsegs() in proc.c. - What values are written into gdt[SEG_UCODE] - and gdt[SEG_UDATA] for init, the first user-space - process? - (You can use Bochs to answer this question.) - - diff --git a/web/x86-mmu1.pdf b/web/x86-mmu1.pdf deleted file mode 100644 index e7103e7..0000000 Binary files a/web/x86-mmu1.pdf and /dev/null differ diff --git a/web/x86-mmu2.pdf b/web/x86-mmu2.pdf deleted file mode 100644 index e548148..0000000 --- a/web/x86-mmu2.pdf +++ /dev/null @@ -1,55 +0,0 @@ -%PDF-1.4 1 0 obj <> >> stream KFf<dp|n'G2tyJ\#菑‘Fa7Ll1њ"^+ aࠎG垼JY w&EBQ 2(09dQfԣjA.zX0r2Aw&9']Y*AɎ[ xr hEd'`_{Z D#:;IVb.D!U~PL 3L<\ๆ  @֝hqiմӭtOQ;wICړ|wA xzzpc/ [*">DQiu"4 {W T7ӹzAӚI_5lw3$XjZE[wluիwjdW_M5ShAB\ e`H"""""""#k%Ph#B#-"@He0AAaMS}'T§Y'D /(&ALɬ *H8TOr~};VK|W (Ku %JxPK %_ I/o($e.J~%jKL~AsfHk?_at04"""#Q|Dr/8-#Hk|h)(r `= Z&9>.ʕ s.!wxvL sK:Ȼ*o;@8z _-Zǵ~!{qr9fcf4'܆戎 48M?afPx_wM7q .({Z]/I_z .aIے{I - JD~>quZtx?__Y#_z_s:%s7_o}p׵`iXKm.kظ⽍0+jjWQV"{ojg4vA8@WM5U"""?21)Kz?w}c(E.Έ8""""#4r c!.ɹCr9F]Uy!͢%,n[o_^jzo __W%0@[;'3`1H*'SQd%JF~4?4vҼfh_Mzg _Kv~C %IL*-Hu P<7ZN$ -]$xY!ܸ({/Zt]RTگz T}M^ kzoҹg}ʙm?߫kJ"8׶ >v0G:8v=4-^*X hWqO3HGn`DE!WA=#w~솔VA7<\@ySYmpX^/W?~{/VlG.>97"8 c Aۣzok_~{]&3Gp3A|GM 5Y _?V_fq'3͑#5E#lˌ֎/DDDDDDDDD}m}:'zGe"G~Bio 2ЌSd]QoPv}^ cs1Z^a =pj]={p ͢78A_df<|NXg>EGNt ot}XCCh4 t?[?zAvkfǬHz&w=Hhc BUa<:WzPA7/<d_vqW A>D)vBw? 8)/isH.Ճ&Oȱ`ő)"^!Џ_:D6#!O??Rv}i U_o 7U91VҴl+,y ~g [68H p]>Q$q;iz+Wg~^AA; -a~[mm;>:_aA # ),0CkV -Q텆 H3KDGxzVLTS,4?w Pi;[4[ „""!(Mqkk^P֒^o~}TpL%|%:(W/[lXRl%Gz 㐎Bh% D%Ŏ]g"g f@a' 4yTEvs˛xdb \3L8JǹpC -! 0'TLz#H98_]KZ}/׿}_wG5DᠹuQ{^믻OUO^KuOG3:5",08atUnɃU"H0D:;C jB? dxqǔqx!07гp^`gxOՐ#_%] "#)YyEN3kW.N暈N_B9; ܇_@E{UՄ"",TsRׂ>&/N8K""#Boz+ A}ճϿ"էwk믯׾! ᄶ!qL}a$De1owk ;A= S ڪMi^Ԭʪ]%#FK@U0ԝw~#▓^k?8rzALY8epaU i}?\ kjm+ -bW_~W]s_1̏Fh0 -p """""""ykܫ?f-2 -Ng+Ymz(AeWRםB.l?,]hd)?"{tE/7(&pC*TeaN!`܆ -C Pzk"n3\5% dܖL/zXL N/Sa5Ysf-e%nZ=wD *' 'w}06uf"GD_OOՠ `|@'_Wtj /|[]uꩪH`}c}JK#1Kmk'nN2qDaik¯ Do.`8aai: (D$4 |CkI`B1P[OT?ִ?]k%B\k]$։|="x8Iux\!Q dZGJAK]5?=p}W ?K./oAi/Cw/|']PKׯ@kյmo^X'^v% +a) 1H8إ&DL*Ar7~{H.ӆ@%^ panT'VP*p<X!  -Þ"jb$""hDDDDD,NM~OA{ ~_%Kz]lR ܯt^a_h#/b#Ʒ2IaHa -h ` zA'wһZrk.ysr{a{~ [ T!إ~޿?_H"?1 n;ڦݵDwO7#m k/c\_t j?Uҿ~o4_%3+K__-~O[Kײ vjڶ]w9_+K҆ bwG> >llW_SVK~V޽M?"ui~׻28^ft ڦ&[_a;00/Xg|@4h!H~$^DDDDG"vu׆(%~5_EQ6-T- ׿c!x8̈́#r#pp   ya3Aqda^Z !`x0G\L[9s0]Ps{89w -@MY hpERX$$95 fC=DCY2dB Y˨4RsQ2S\h)J2rq) ռ"HgdޤE̋8 giY28 *#Ԝz&i2^IT^ו#_Ti}))ӑV { -l^Kx{T}4yq[#TDj3΄;1;8dsA4K|>'0S1 -ۇ>"F4gq=Oz502@ ̎VP.|] = `AA?_40  -&8}bL 4N-=:ӈi"c_TnWWqCx_'_Mi릩ߧ:Oz]LCd!iWw~Fܔr7N$g Q~)czFJ?#'Z"#X2;EB@DY"Ò%r,dw%=?O#Oy*@iG -?Z<'E/t ^kǥzzK?;EkQc#$kA?SwKUI/TD!ݿѠ?Ql#}G]^Bj=σ~ ]D路0넿_  7-_~.l7ᇯ"׿?ւikY:.uu AҐ࿾=4_:[N5dta-__rӽϙrz pA #:խGKAm! -ۆh[}W ,,?yd^H 0]tDl+-ֿa~Vcce  b-qߍگlcoӵw{WK}NY>궿ZnAD[ﰚ 'vDkdX®NPd 0Zb1HzdAި4^ 0@:2:j4!0Mavh՚Bh"hOQ"""""65f?V5հx/tJ@ KF?s7c`fiRUӍGT bl˃rt\ˊb9gto#qDmP w!9 ,rrxQ6"""""C$,r-0h9k DDDI9 1gNSHD*,!G(DDDDHcmnE4"""%`!tl GQ^).ق0M]TG!z(As#DDDX`;;3?pӳY#1vp9,KMM$e9}.]ʂMK\ǒ:_D 6#ă]( w i3š9F\G3SF /L! ac׈|E. 6. Ȏfc˖h߿ji(E_8;'ԂzH|?P=$(|0Ikv^4^~  H:OHh'.װȣOB\%~G K]uuUE̞x'ǒ- X*ڶa0i}/銏S[Q{DZ څL/a|DZu+ \!_0DDHDr+٦_ 91ɎCne aʂr -auG\(2 -ADDDDAs$ A 3TF W,a VȔF5&@́)ׯ_ y8 z!'|z}n}?__⾻ ׯ]O{4o߿X`/=VKֲ?\hqMtCvq -_ׯ]r};#_~sY]h뿵|fq)5HN -hDvxydr̠ڔ?3vPY ntxMM=m=? 0; a"g#>~T'kiX=[DYwDoylh5i;[>'oOt]^:U_ L?^?[>}D܄lݨ理Z(n>>yps"7_ ]BׯWq_x_蹯_ ҷ^iy~S_?;ܞب/_¶^kN\c}ҩֿ.XkkI}pm5 {z]E_'V[_ǻ dQ5i{c?C :'Aia8d~M?-NᓅdW]ᔝ#V]Sy78_]{t`_q__zy@gA&$NCj2âa﾿]}kZ3d(?o_^_>_y?Q޺߯S`֪u~_~ oxX/?___޿bɹݞd' H0c94eACМʖ5DDDDDDZ2:0@[DDDDDDAr -9>4P 's1—S0`B9i|DDDDDDDDV eـqGC0+<\4G,G B8d;+?jd"طj֔oZWޚvF+UӮU___^~_[U_ɰ@VE,50Lvԟ__߯}Ǒ3 Eԏ36}#S=,?/,w?4!58P@‚ p:|Cb3 0@~C#Óyᚣ_moOOOZN!ń AމwiJ=r''n˶HnEԛI 7P_[~wA<% $xASrxy;%nNRԼ?N'^uMzNqwt o}{V}i'[<߽d''Ԋc_~wq\WTG_xo0<_}B`@Tׯx7?]1W5`ofQ_;^oO?/9?A"&(R9}Ԝl~߿YW]^kK_;K2*j 5+]RMoD]{TC⡄ _ /L/Uô:_–: M /,4DDDGABPGTsq?b""Zn־IC+pVĂ]"7~-:32Ȝ`Y;63v;zAJZzQwMݭ=+VZSOk_i\:K߿ϑ c"z8fI$}  N_=k׻t0=rDa8)E ͦxf*}&}4~A[]>=m?K^[?녗O(ƉGN$; z$ $]5OxA*D?:I?L/K/"Ηu!ۆ)w { aqN=(uJrc}*(DDHd0Oa'B B_WY<"$Zߠ{u+"莈GFbQ|j=p_-C9C,DDDDz._8oڶZV޵S̏\W5 0ҭ_׈>*-6?n+ kjL -NdW0xi"8qaA T aW&Ȏ8|GPޙ}+ޞ O.eDy_oKvtdHg#f6!.)ppFr)n3_uUB"""""""""?1 qn11vlb.]3er8rGP'_˂%~-Ƒvy#iIp/=w]_׷iov1{h8#AgHψj"""rW帾vth2CL3oIDv?|e3Gٸ:|#/ F/q`Eat}4A?{[WyJ'N܎=K%{WL'n u_HҽYi{ǿo}P4#.F$?7ztypQ)s~ ߿p|~1eq|u4˂8R8`mo޺|on"""""#uuk zj6)8☯X+5]V44 U Aa2p 'ik"""?rGs25/n,V4a0;_'7;}ׯ֟P)9{o.˰<_B & ~5_Vȏ %TO2yJWQT_ jƷ_.pz~_iz 2A\d\Gn/DDDH\Ɏw"a?qgfGH. #Ad`/##qF #3`m zKb"""""""""""%NtHP#28#6g4ds>p KwQEm骾m2 ˵M4M Z_'qh_i4͚؆5a;֤#w# V xֿZQh_iU8dpi.dqHxB8s0eȜpsK|et莫} n9!܌s *G09m ԡ. -B/""""""#7=>rXr=܄vGGtj",GfG܎%^/"""""""#c_k_aP׿唴0vpS0wS"DfF /޺9R <.uzU-t].b%kIAChק~!g<4 >&j282Gh/0G gs”3fh BxNI} ~a wUNc}Or ektn7A)+8Gȣ%t =SUI7%} _IzN%x+8FG MZgzҽӤ@:.xk*%W@BCiJQDoR(;ԗ=k\׵(H{'/xA-#C^6> ,k`݂ٲl.x\q~$q}ݯ]GWnցaa uwz=;"dcK"4" Ј3& -CSh _@DDFǯ *ڃI/ݤt]`ݤ0Ď{V!;O,F#3ˑ40d")jŕfL qH)dn C'UM5 5TAPL&+^0PoRiތKփ˿/4/?wznr垓Yє?/7WD0XK_a/#W_:v]n/W(4N+K ZqmSZҽ _V_你 ſ]-kiZja|*M1 |0_1[H c^Mi * nDDDD4#⠶`2<]dpPG\3fl60fl/ gAdrr B+':~+@9a[(B B'DDD;)F7(j"Q{C @L>OgϿ}馱"#'~]~}]_ 5Uw]Z|/z?k_8櫪8+">vu,MsxACޝ:CԿzuQ>:@&y5Y &K=xzOO} -h3B On#hFeoIz_Kbo[uO_WZ^5E]x?}|{N/zkޞ'lt*B(o0%ĭĸ=oEw_O8AW8/Fb~u6 ~<-ĺ%p/_J_ '|z]rߜ@nW_>8[OW]|M~io=i[[ aov az._{ kK66858>`/O*ikmkLjZoڿQi4_~_\DI A hC/RӴ ^-b"""#>HKR%d;"w8AܝH}$]t_z& Kza=0'A?ǼV[~_u=oOL/kA_ӻH'Km& z__o_zך\]5?=#qk_oR,}u]oЈu}yjrsY^o}?bST(ˢ$ϟkikztmy6=} -wb߈qVDZ0# /vV -z+zc㊊"? 5k}ki5ah0M4a* SMߖ:AE~ jE^DDDDDDDDqDJx0hDEe0t Qk __ 7-Y*2)B?JSP.ЕsAA7׶Ǘ"|d7 00tiuH~x'̾:'O>=7_wIx>kth'@}c?ԈfYEB CT -F1O%ߤ"""#%b?վ3es@#B>x͑2>Gˇ#r>_#AumDDDDDDDDDDDDGCc߾U!^ 'pja #2w d鄾kL_[32e(.Y)Tm͑'~ yK򆟄IW33b4y!'dDk BADX M4 dqM?8|XCM*wڼ>'n^d%}a‘R7~HF_ N~6a~}c=?Z_WZN/ׅ'LQOAN娈O EQUn3 B8g#l}KՈcZE?^Zճ;A:ߙ߿¶?mv]Q1 &%^*;[aq_wqݭ{ 2VpkR(ki'"""""""""!fQ0MT""#WZk~pk:ȣ.Z0lZM_zߐlB8*28`PiDDDDD$l36289᜸ˌO]/1&Ў2]aDDxABD5{K_]rc;4~P?__:C$ ΌA C1-M4{w];ܜQ(wEN'A==>&OM_pc?/>o0˃Is.P28As/SB"""""""@1x*|\<7DDDDDD{3:Zyzzvn5"a4|ɀ#Hg^ڈxaXaaa_oM1LqWh8i i""" ?~_}nJD b5@B ͪdIO{oF8_VNxRr8nn>)8y0Zoa ![5WH}CoNO rWA̼D=t=? :^߽%`/'u#B"l/J DDDDH-9A4LX"?DGU_Y># \'eOa/Ct2VgB.ˑ̸Ⱦp4 pp#$3?֕T_εa+[_l.kkimo9tLC?᭯j)_L>zdG0@F4  `0""""""""?ܵ6td3gc*}]: Jds "n^<7]UT.!麬0;IR15~<  tK0/_REȐF 1a4>NІh^ kz?PuyǩyD XM¿ޞ إkAN?'M8)"Fi:ez݃{52—^6t_);#VC5ޡ=ZRW1.`q .\#\ +_zoҰw""""""?2$K[ -vi"*8b5]Bh4״M`0a3fX4DDDG_?n NFfΗwrޫI_a?}+߯dgI}hn/|@ٰAI=uT^9;?DХt%boK^?/0<)q{.fq1Ⱦtα(F5H:*?q!x ]r 9aɎR ʿ_daA,r!lX a=6i͈6vy/3r.2WTD6?a2 4'5Xpaזa? ߠ'6P2Af a{zM;kJ/\,$Z=O`6?ɧk#aS.2l5M \C˜3G b 9{"aL0I`r'aυHr2&‡+ -VynP'l""""""""""""""#"i=3Bc!G 9}ߙDDDDLr!̐)aˣ4`S#h p<4O~|>M}V~ ToU 9 \saܭ9CB+%O$ܪW'aI2:}r\CC.r=?|ZwzZ 쪟_( |+#(J8??(KQUo o{ӯKYةAIym?7kUwK" pZ$?}BEP zPUr ״W6?U,8׵8;}uA'ϙqٲ_qvq׽ ?c+{l{`_V + &߿vޫ{~7kv?{Hq;"A|0Bh05^; kkui{4p 0Av  {L-A{[Z_M+oA""""""""" q P2 2Q2p7ȯu ?p -Z"*""""#B""!4 A-T 4@ͤ4Pp8dq\C-& -%~""""""""""8#C - ֎Zڌ+8a5T-dXiZ7+L i#@DT(#GeCD ЈaDdAJ)τxנi޷-ߴ=+dJ~Zb:ⓞ=4 B5' }63< Ayf4Dvt gS5Ašҽ0zC3 d~Z[O'h4uHx5䞫%jv}WX^GG{D h2W_ zz -=x -]6BUP]5]-_7@׾ǹ%_}Xz'[~Xp<ԆXK ֈQl6/GHt6I{kB7|`|}:tIG[U j߸ l`~69+:{pޯQc CJ*?]׮k魯i#2O B}V~AQ> stream q 612.00 0 0 792.00 0.00 0.00 cm 0 g /Obj1 Do Q endstream endobj 3 0 obj << /Type /Pages /Kids [ 4 0 R ] /Count 1 >> endobj 4 0 obj << /Type /Page /MediaBox [ 0 0 612 792 ] /Parent 3 0 R /Rotate 0 /Resources << /ProcSet [/PDF /ImageC /ImageB /ImageI] /XObject << /Obj1 1 0 R >> >> /Contents [2 0 R ] >> endobj 5 0 obj << /Type /Catalog /Pages 3 0 R >> endobj 6 0 obj << /Creator (HP Digital Sending Device) /CreationDate () /Author () /Producer (HP Digital Sending Device) /Title () /Subject() >> endobj xref 0 7 0000000000 65535 f 0000000009 00000 n 0000025495 00000 n 0000025597 00000 n 0000025656 00000 n 0000025843 00000 n 0000025892 00000 n trailer << /Size 7 /Root 5 0 R /Info 6 0 R >> startxref 26037 %%EOF \ No newline at end of file diff --git a/web/xv6-disk.html b/web/xv6-disk.html deleted file mode 100644 index 65bcf8f..0000000 --- a/web/xv6-disk.html +++ /dev/null @@ -1,63 +0,0 @@ - - -Homework: Files and Disk I/O - - - -

    Homework: Files and Disk I/O

    - -

    -Read: bio.c, fd.c, fs.c, and ide.c - -

    -This homework should be turned in at the beginning of lecture. - -

    -File and Disk I/O - -

    Insert a print statement in bwrite so that you get a -print every time a block is written to disk: - -

    -  cprintf("bwrite sector %d\n", sector);
    -
    - -

    Build and boot a new kernel and run these three commands at the shell: -

    -  echo >a
    -  echo >a
    -  rm a
    -  mkdir d
    -
    - -(You can try rm d if you are curious, but it should look -almost identical to rm a.) - -

    You should see a sequence of bwrite prints after running each command. -Record the list and annotate it with the calling function and -what block is being written. -For example, this is the second echo >a: - -

    -$ echo >a
    -bwrite sector 121  # writei  (data block)
    -bwrite sector 3    # iupdate (inode block)
    -$ 
    -
    - -

    Hint: the easiest way to get the name of the -calling function is to add a string argument to bwrite, -edit all the calls to bwrite to pass the name of the -calling function, and just print it. -You should be able to reason about what kind of -block is being written just from the calling function. - -

    You need not write the following up, but try to -understand why each write is happening. This will -help your understanding of the file system layout -and the code. - -

    -This completes the homework. - - diff --git a/web/xv6-intro.html b/web/xv6-intro.html deleted file mode 100644 index 3669866..0000000 --- a/web/xv6-intro.html +++ /dev/null @@ -1,163 +0,0 @@ -Homework: intro to xv6 - - - - - -

    Homework: intro to xv6

    - -

    This lecture is the introduction to xv6, our re-implementation of - Unix v6. Read the source code in the assigned files. You won't have - to understand the details yet; we will focus on how the first - user-level process comes into existence after the computer is turned - on. -

    - -Hand-In Procedure -

    -You are to turn in this homework during lecture. Please -write up your answers to the exercises below and hand them in to a -6.828 staff member at the beginning of lecture. -

    - -

    Assignment: -
    -Fetch and un-tar the xv6 source: - -

    -sh-3.00$ wget http://pdos.csail.mit.edu/6.828/2007/src/xv6-rev1.tar.gz 
    -sh-3.00$ tar xzvf xv6-rev1.tar.gz
    -xv6/
    -xv6/asm.h
    -xv6/bio.c
    -xv6/bootasm.S
    -xv6/bootmain.c
    -...
    -$
    -
    - -Build xv6: -
    -$ cd xv6
    -$ make
    -gcc -O -nostdinc -I. -c bootmain.c
    -gcc -nostdinc -I. -c bootasm.S
    -ld -N -e start -Ttext 0x7C00 -o bootblock.o bootasm.o bootmain.o
    -objdump -S bootblock.o > bootblock.asm
    -objcopy -S -O binary bootblock.o bootblock
    -...
    -$ 
    -
    - -Find the address of the main function by -looking in kernel.asm: -
    -% grep main kernel.asm
    -...
    -00102454 <mpmain>:
    -mpmain(void)
    -001024d0 <main>:
    -  10250d:       79 f1                   jns    102500 <main+0x30>
    -  1025f3:       76 6f                   jbe    102664 <main+0x194>
    -  102611:       74 2f                   je     102642 <main+0x172>
    -
    -In this case, the address is 001024d0. -

    - -Run the kernel inside Bochs, setting a breakpoint -at the beginning of main (i.e., the address -you just found). -

    -$ make bochs
    -if [ ! -e .bochsrc ]; then ln -s dot-bochsrc .bochsrc; fi
    -bochs -q
    -========================================================================
    -                       Bochs x86 Emulator 2.2.6
    -                    (6.828 distribution release 1)
    -========================================================================
    -00000000000i[     ] reading configuration from .bochsrc
    -00000000000i[     ] installing x module as the Bochs GUI
    -00000000000i[     ] Warning: no rc file specified.
    -00000000000i[     ] using log file bochsout.txt
    -Next at t=0
    -(0) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b         ; ea5be000f0
    -(1) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b         ; ea5be000f0
    -<bochs> 
    -
    - -Look at the registers and the stack contents: - -
    -<bochs> info reg
    -...
    -<bochs> print-stack
    -...
    -<bochs>
    -
    - -Which part of the stack printout is actually the stack? -(Hint: not all of it.) Identify all the non-zero values -on the stack.

    - -Turn in: the output of print-stack with -the valid part of the stack marked. Write a short (3-5 word) -comment next to each non-zero value explaining what it is. -

    - -Now look at kernel.asm for the instructions in main that read: -

    -  10251e:       8b 15 00 78 10 00       mov    0x107800,%edx
    -  102524:       8d 04 92                lea    (%edx,%edx,4),%eax
    -  102527:       8d 04 42                lea    (%edx,%eax,2),%eax
    -  10252a:       c1 e0 04                shl    $0x4,%eax
    -  10252d:       01 d0                   add    %edx,%eax
    -  10252f:       8d 04 85 1c ad 10 00    lea    0x10ad1c(,%eax,4),%eax
    -  102536:       89 c4                   mov    %eax,%esp
    -
    -(The addresses and constants might be different on your system, -and the compiler might use imul instead of the lea,lea,shl,add,lea sequence. -Look for the move into %esp). -

    - -Which lines in main.c do these instructions correspond to? -

    - -Set a breakpoint at the first of those instructions -and let the program run until the breakpoint: -

    -<bochs> vb 0x8:0x10251e
    -<bochs> s
    -...
    -<bochs> c
    -(0) Breakpoint 2, 0x0010251e (0x0008:0x0010251e)
    -Next at t=1157430
    -(0) [0x0010251e] 0008:0x0010251e (unk. ctxt): mov edx, dword ptr ds:0x107800 ; 8b1500781000
    -(1) [0xfffffff0] f000:fff0 (unk. ctxt): jmp far f000:e05b         ; ea5be000f0
    -<bochs> 
    -
    -(The first s command is necessary -to single-step past the breakpoint at main, otherwise c -will not make any progress.) -

    - -Inspect the registers and stack again -(info reg and print-stack). -Then step past those seven instructions -(s 7) -and inspect them again. -Convince yourself that the stack has changed correctly. -

    - -Turn in: answers to the following questions. -Look at the assembly for the call to -lapic_init that occurs after the -the stack switch. Where does the -bcpu argument come from? -What would have happened if main -stored bcpu -on the stack before those four assembly instructions? -Would the code still work? Why or why not? -

    - - - diff --git a/web/xv6-lock.html b/web/xv6-lock.html deleted file mode 100644 index 887022a..0000000 --- a/web/xv6-lock.html +++ /dev/null @@ -1,100 +0,0 @@ -Homework: Locking - - - - - -

    Homework: Locking

    - - -

    -Read: spinlock.c - -

    -Hand-In Procedure -

    -You are to turn in this homework at the beginning of lecture. Please -write up your answers to the exercises below and hand them in to a -6.828 staff member at the beginning of lecture. -

    - -Assignment: -In this assignment we will explore some of the interaction -between interrupts and locking. -

    - -Make sure you understand what would happen if the kernel executed -the following code snippet: -

    -  struct spinlock lk;
    -  initlock(&lk, "test lock");
    -  acquire(&lk);
    -  acquire(&lk);
    -
    -(Feel free to use Bochs to find out. acquire is in spinlock.c.) -

    - -An acquire ensures interrupts are off -on the local processor using cli, -and interrupts remain off until the release -of the last lock held by that processor -(at which point they are enabled using sti). -

    - -Let's see what happens if we turn on interrupts while -holding the ide lock. -In ide_rw in ide.c, add a call -to sti() after the acquire(). -Rebuild the kernel and boot it in Bochs. -Chances are the kernel will panic soon after boot; try booting Bochs a few times -if it doesn't. -

    - -Turn in: explain in a few sentences why the kernel panicked. -You may find it useful to look up the stack trace -(the sequence of %eip values printed by panic) -in the kernel.asm listing. -

    - -Remove the sti() you added, -rebuild the kernel, and make sure it works again. -

    - -Now let's see what happens if we turn on interrupts -while holding the kalloc_lock. -In kalloc() in kalloc.c, add -a call to sti() after the call to acquire(). -You will also need to add -#include "x86.h" at the top of the file after -the other #include lines. -Rebuild the kernel and boot it in Bochs. -It will not panic. -

    - -Turn in: explain in a few sentences why the kernel didn't panic. -What is different about kalloc_lock -as compared to ide_lock? -

    -You do not need to understand anything about the details of the IDE hardware -to answer this question, but you may find it helpful to look -at which functions acquire each lock, and then at when those -functions get called. -

    - -(There is a very small but non-zero chance that the kernel will panic -with the extra sti() in kalloc. -If the kernel does panic, make doubly sure that -you removed the sti() call from -ide_rw. If it continues to panic and the -only extra sti() is in bio.c, -then mail 6.828-staff@pdos.csail.mit.edu -and think about buying a lottery ticket.) -

    - -Turn in: Why does release() clear -lock->pcs[0] and lock->cpu -before clearing lock->locked? -Why not wait until after? - - - diff --git a/web/xv6-names.html b/web/xv6-names.html deleted file mode 100644 index 926be3a..0000000 --- a/web/xv6-names.html +++ /dev/null @@ -1,78 +0,0 @@ - - -Homework: Naming - - - -

    Homework: Naming

    - -

    -Read: namei in fs.c, fd.c, sysfile.c - -

    -This homework should be turned in at the beginning of lecture. - -

    -Symbolic Links - -

    -As you read namei and explore its varied uses throughout xv6, -think about what steps would be required to add symbolic links -to xv6. -A symbolic link is simply a file with a special type (e.g., T_SYMLINK -instead of T_FILE or T_DIR) whose contents contain the path being -linked to. - -

    -Turn in a short writeup of how you would change xv6 to support -symlinks. List the functions that would have to be added or changed, -with short descriptions of the new functionality or changes. - -

    -This completes the homework. - -

    -The following is not required. If you want to try implementing -symbolic links in xv6, here are the files that the course staff -had to change to implement them: - -

    -fs.c: 20 lines added, 4 modified
    -syscall.c: 2 lines added
    -syscall.h: 1 line added
    -sysfile.c: 15 lines added
    -user.h: 1 line added
    -usys.S: 1 line added
    -
    - -Also, here is an ln program: - -
    -#include "types.h"
    -#include "user.h"
    -
    -int
    -main(int argc, char *argv[])
    -{
    -  int (*ln)(char*, char*);
    -  
    -  ln = link;
    -  if(argc > 1 && strcmp(argv[1], "-s") == 0){
    -    ln = symlink;
    -    argc--;
    -    argv++;
    -  }
    -  
    -  if(argc != 3){
    -    printf(2, "usage: ln [-s] old new (%d)\n", argc);
    -    exit();
    -  }
    -  if(ln(argv[1], argv[2]) < 0){
    -    printf(2, "%s failed\n", ln == symlink ? "symlink" : "link");
    -    exit();
    -  }
    -  exit();
    -}
    -
    - - diff --git a/web/xv6-sched.html b/web/xv6-sched.html deleted file mode 100644 index f8b8b31..0000000 --- a/web/xv6-sched.html +++ /dev/null @@ -1,96 +0,0 @@ -Homework: Threads and Context Switching - - - - - -

    Homework: Threads and Context Switching

    - -

    -Read: swtch.S and proc.c (focus on the code that switches -between processes, specifically scheduler and sched). - -

    -Hand-In Procedure -

    -You are to turn in this homework during lecture. Please -write up your answers to the exercises below and hand them in to a -6.828 staff member at the beginning of lecture. -

    -Introduction - -

    -In this homework you will investigate how the kernel switches between -two processes. - -

    -Assignment: -

    - -Suppose a process that is running in the kernel -calls sched(), which ends up jumping -into scheduler(). - -

    -Turn in: -Where is the stack that sched() executes on? - -

    -Turn in: -Where is the stack that scheduler() executes on? - -

    -Turn in: -When sched() calls swtch(), -does that call to swtch() ever return? If so, when? - -

    -Turn in: -Why does swtch() copy %eip from the stack into the -context structure, only to copy it from the context -structure to the same place on the stack -when the process is re-activated? -What would go wrong if swtch() just left the -%eip on the stack and didn't store it in the context structure? - -

    -Surround the call to swtch() in schedule() with calls -to cons_putc() like this: -

    -      cons_putc('a');
    -      swtch(&cpus[cpu()].context, &p->context);
    -      cons_putc('b');
    -
    -

    -Similarly, -surround the call to swtch() in sched() with calls -to cons_putc() like this: - -

    -  cons_putc('c');
    -  swtch(&cp->context, &cpus[cpu()].context);
    -  cons_putc('d');
    -
    -

    -Rebuild your kernel and boot it on bochs. -With a few exceptions -you should see a regular four-character pattern repeated over and over. -

    -Turn in: What is the four-character pattern? -

    -Turn in: The very first characters are ac. Why does -this happen? -

    -Turn in: Near the start of the last line you should see -bc. How could this happen? - -

    -This completes the homework. - - - - - - - - diff --git a/web/xv6-sleep.html b/web/xv6-sleep.html deleted file mode 100644 index e712a40..0000000 --- a/web/xv6-sleep.html +++ /dev/null @@ -1,100 +0,0 @@ -Homework: sleep and wakeup - - - - - -

    Homework: sleep and wakeup

    - -

    -Read: pipe.c - -

    -Hand-In Procedure -

    -You are to turn in this homework at the beginning of lecture. Please -write up your answers to the questions below and hand them in to a -6.828 staff member at the beginning of lecture. -

    -Introduction -

    - -Remember in lecture 7 we discussed locking a linked list implementation. -The insert code was: - -

    -        struct list *l;
    -        l = list_alloc();
    -        l->next = list_head;
    -        list_head = l;
    -
    - -and if we run the insert on multiple processors simultaneously with no locking, -this ordering of instructions can cause one of the inserts to be lost: - -
    -        CPU1                           CPU2
    -       
    -        struct list *l;
    -        l = list_alloc();
    -        l->next = list_head;
    -                                       struct list *l;
    -                                       l = list_alloc();
    -                                       l->next = list_head;
    -                                       list_head = l;
    -        list_head = l;
    -
    - -(Even though the instructions can happen simultaneously, we -write out orderings where only one CPU is "executing" at a time, -to avoid complicating things more than necessary.) -

    - -In this case, the list element allocated by CPU2 is lost from -the list by CPU1's update of list_head. -Adding a lock that protects the final two instructions makes -the read and write of list_head atomic, so that this -ordering is impossible. -

    - -The reading for this lecture is the implementation of sleep and wakeup, -which are used for coordination between different processes executing -in the kernel, perhaps simultaneously. -

    - -If there were no locking at all in sleep and wakeup, it would be -possible for a sleep and its corresponding wakeup, if executing -simultaneously on different processors, to miss each other, -so that the wakeup didn't find any process to wake up, and yet the -process calling sleep does go to sleep, never to awake. Obviously this is something -we'd like to avoid. -

    - -Read the code with this in mind. - -

    -

    -Questions -

    -(Answer and hand in.) -

    - -1. How does the proc_table_lock help avoid this problem? Give an -ordering of instructions (like the above example for linked list -insertion) -that could result in a wakeup being missed if the proc_table_lock were not used. -You need only include the relevant lines of code. -

    - -2. sleep is also protected by a second lock, its second argument, -which need not be the proc_table_lock. Look at the example in ide.c, -which uses the ide_lock. Give an ordering of instructions that could -result in a wakeup being missed if the ide_lock were not being used. -(Hint: this should not be the same as your answer to question 2. The -two locks serve different purposes.)

    - -

    -This completes the homework. - - -