minix

Author	SHA1	Message	Date
Thomas Veerman	aa521228a5	VFS: Coverity appeasements	2013-01-11 09:42:01 +00:00
Thomas Veerman	ea8ff9284a	Add stack trace dumps for VFS over serial	2013-01-11 09:18:36 +00:00
Thomas Veerman	625f4ae4a3	VFS: add documentation about internal working	2013-01-11 09:18:36 +00:00
Thomas Veerman	23c5f56e32	VFS: change locking to ease concurrent FSes This patch uses stricter locking for REQ_LINK, REQ_MKDIR, REQ_MKNOD, REQ_RENAME, REQ_RMDIR, REQ_SLINK and REQ_UNLINK. For all requests, VFS locks the directory in which we add or remove an inode with VNODE_WRITE. I.e., the operations have exclusive access to that directory. Furthermore, REQ_CHOWN, REQ_CHMOD, and REQ_FTRUNC now lock the vmnt VMNT_READ; VMNT_WRITE was unnecessary.	2013-01-11 09:18:35 +00:00
Thomas Veerman	3de8d1cf6e	VFS/PFS: remove notion of position in pipes Because pipes have no file position. VFS maintained (file) offsets into a buffer internal to PFS and stored them in vnodes for simplicity, mixing the responsibilities of filp and vnode objects. With this patch PFS ignores the position field in REQ_READ and REQ_WRITE requests making VFS' job a lot simpler.	2013-01-11 09:18:35 +00:00
Thomas Veerman	7c8b3ddfed	VFS: fix locking bugs .sync and fsync used unnecessarily restrictive locking type .fsync violated locking order by obtaining a vmnt lock after a filp lock .fsync contained a TOCTOU bug .new_node violated locking rules (didn't upgrade lock upon file creation) .do_pipe used unnecessarily restrictive locking type .always lock pipes exclusively; even a read operation might require to do a write on a vnode object (update pipe size) .when opening a file with O_TRUNC, upgrade vnode lock when truncating .utime used unnecessarily restrictive locking type .path parsing: .always acquire VMNT_WRITE or VMNT_EXCL on vmnt and downgrade to VMNT_READ if that was what was actually requested. This prevents the following deadlock scenario: thread A: lock_vmnt(vmp, TLL_READSER); lock_vnode(vp, TLL_READSER); upgrade_vmnt_lock(vmp, TLL_WRITE); thread B: lock_vmnt(vmp, TLL_READ); lock_vnode(vp, TLL_READSER); thread A will be stuck in upgrade_vmnt_lock and thread B is stuck in lock_vnode. This happens when, for example, thread A tries create a new node (open.c:new_node) and thread B tries to do eat_path to change dir (stadir.c:do_chdir). When the path is being resolved, a vnode is always locked with VNODE_OPCL (TLL_READSER) and then downgraded to VNODE_READ if read-only is actually requested. Thread A locks the vmnt with VMNT_WRITE (TLL_READSER) which still allows VMNT_READ locks. Thread B can't acquire a lock on the vnode because thread A has it; Thread A can't upgrade its vmnt lock to VMNT_WRITE (TLL_WRITE) because thread B has a VMNT_READ lock on it. By serializing vmnt locks during path parsing, thread B can only acquire a lock on vmp when thread A has completely finished its operation.	2013-01-11 09:18:35 +00:00
Kees Jongenburger	c0c581a635	vfs:fix for variable 'rfp' set but not used. mount.c: In function 'mount_pfs': mount.c:395:17: error: variable 'rfp' set but not used [-Werror=unused-but-set-variable] Change-Id: I2f22590ab4e3a4a1678e9096626ebca53d2660e6	2013-01-07 09:12:27 +01:00
Ben Gras	8aeac26999	vfs: fix clobbering fd_nr dumpcore: fd_nr can be in use as blocking fd but will then be clobbered by common_open, causing disaster for exiting unpause().	2012-12-11 12:00:57 +01:00
David van Moolenbroek	766047123a	VFS: fix off-by-one in get_name()	2012-11-30 12:24:47 +00:00
Thomas Veerman	179261a9b6	mtab: support moving mount points Also fix canonical_path function; it fails to parse some paths	2012-11-29 10:50:51 +00:00
Thomas Veerman	d9f4f71916	Implement dynamic mtab support With this patch /etc/mtab becomes obsolete.	2012-11-26 15:20:18 +00:00
Thomas Veerman	de83b2a9d9	VFS: change 'last_dir' to match locking assumption new_node makes the assumption that when it does last_dir on a path, a successive advance would not yield a lock on a vmnt, because last_dir already locked the vmnt. This is true except when last_dir resolves to a directory on the parent vmnt of the file that was the result of advance. For example, # cd / # echo foo > home where home is on a different (sub) partition than / is (default install). last_dir would resolve to / and advance would resolve to /home. With this change, last_dir resolves to the root node on the /home partition, making the assumption valid again.	2012-11-26 15:20:18 +00:00
David van Moolenbroek	7dd286e6b8	VFS: do not save device node for new regular files The VFS/FS protocol does not require the file server to supply a special device node number in response to a REQ_CREATE request, as this call creates only regular files. Therefore, VFS should not erroneously save this piece of information from the REQ_CREATE reply either.	2012-11-15 14:29:59 +00:00
Thomas Veerman	14e470be81	VFS: fix TOCTOU bug in sync	2012-11-14 13:24:53 +00:00
Thomas Veerman	ed23a7a7d2	VFS: fix reboot panic with mounted FUSE FS Upon reboot VFS semi-exits all processes and unmounts the file system. However, upon unmount, exiting FUSE file systems might need service from the file system (due to libc). As the FUSE process is halfway the exit procedure, it doesn't have a valid root directory and working directory. Trying to do system calls then triggers a sanity check in VFS. This fix first exits normal processes which should then allow for unmounting FUSE file systems. Then VFS exits all processes including File Servers and unmounts the rest of the file system.	2012-11-14 13:18:16 +00:00
Thomas Veerman	badec36b33	VFS: fix deadlock when out of worker threads There is a deadlock vulnerability when there are no worker threads available and all of them blocked on a worker thread that's waiting for a reply from a driver or a reply from an FS that needs to make a back call. In these cases the deadlock resolver thread should kick in, but didn't in all cases. Moreover, POSIX calls from File Servers weren't handled properly anymore, which also could lead to deadlocks.	2012-11-14 13:12:37 +00:00
Arne Welzel	e35c4f78d2	VFS: fix check_bsf() locking The check_bsf() macro uses assert(mutex_trylock(&bsf_lock)) and assumes bsf_lock is locked afterwards. This breaks when compiling with NOASSERTS="yes". Also: macro to function transition.	2012-09-28 14:57:34 +02:00
Arne Welzel	7e1074732b	VFS: resolve unused parameter if NOASSERTS="yes" If VFS is compiled with NOASSERTS="yes", ctty_opcl() does not use the op parameter. Change to "non-assert()" sanity check.	2012-09-28 14:57:32 +02:00
Ben Gras	60014efb3e	vfs: pm_dumpcore: always clean up process . whenever this function is called, pm will expect the process to be cleaned up . so don't abort the process entirely on error . fixes a later 'forking on top of in-use child' vfs panic	2012-09-19 17:13:17 +02:00
Thomas Veerman	c087a60ed2	VFS: fix GCC compilation error	2012-09-17 15:29:38 +00:00
Thomas Veerman	3881e732a9	VFS: panic when unmount_all fails	2012-09-17 11:01:46 +00:00
Thomas Veerman	992799b91f	VFS: make all IPC asynchronous By decoupling synchronous drivers from VFS, we are a big step closer to supporting driver crashes under all circumstances. That is, VFS can't become stuck on IPC with a synchronous driver (e.g., INET) and can recover from crashing block drivers during open/close/ioctl or during communication with an FS. In order to maintain serialized communication with a synchronous driver, the communication is wrapped by a mutex on a per driver basis (not major numbers as there can be multiple majors with identical endpoints). Majors that share a driver endpoint point to a single mutex object. In order to support crashes from block drivers, the file reopen tactic had to be changed; first reopen files associated with the crashed driver, then send the new driver endpoint to FSes. This solves a deadlock between the FS and the block driver; - VFS would send REQ_NEW_DRIVER to an FS, but he FS only receives it after retrying the current request to the newly started driver. - The block driver would refuse the retried request until all files had been reopened. - VFS would reopen files only after getting a reply from the initial REQ_NEW_DRIVER. When a character special driver crashes, all associated files have to be marked invalid and closed (or reopened if flagged as such). However, they can only be closed if a thread holds exclusive access to it. To obtain exclusive access, the worker thread (which handles the new driver endpoint event from DS) schedules a new job to garbage collect invalid files. This way, we can signal the worker thread that was talking to the crashed driver and will release exclusive access to a file associated with the crashed driver and prevent the garbage collecting worker thread from dead locking on that file. Also, when a character special driver crashes, RS will unmap the driver and remap it upon restart. During unmapping, associated files are marked invalid instead of waiting for an endpoint up event from DS, as that event might come later than new read/write/select requests and thus cause confusion in the freshly started driver. When locking a filp, the usage counters are no longer checked. The usage counter can legally go down to zero during filp invalidation while there are locks pending. DS events are handled by a separate worker thread instead of the main thread as reopening files could lead to another crash and a stuck thread. An additional worker thread is then necessary to unlock it. Finally, with everything asynchronous a race condition in do_select surfaced. A select entry was only marked in use after succesfully sending initial select requests to drivers and having to wait. When multiple select() calls were handled there was opportunity that these entries were overwritten. This had as effect that some select results were ignored (and select() remained blocking instead if returning) or do_select tried to access filps that were not present (because thrown away by secondary select()). This bug manifested itself with sendrecs, but was very hard to reproduce. However, it became awfully easy to trigger with asynsends only.	2012-09-17 11:01:45 +00:00
Ben Gras	e4ac80eb60	various warning/errorwarning fixes for gcc47 . warnings (sometimes promoted to errors) in servers/ and kernel/ . -Os for ext2 boot module to make it small enough	2012-08-27 16:19:18 +02:00
Ben Gras	31d8526346	libexec: add load_offset feature, used for ld.so . ld.so is linked at 0 but it can relocate itself; we wish to load ld.so higher though to trap NULL dereferences. if we know we have to execute ld.so, vfs tells libexec to put it higher.	2012-08-12 23:22:54 +02:00
Thomas Veerman	66dbf73049	VFS: fix locking bug in clone_opcl When VFS runs out of vnodes after closing a vnode in opcl, common_open will try to unlock a vnode through unlock_filp that has already been unlocked in clone_opcl. By first obtaining and locking a new vnode this situation is prevented; if there are no free vnodes, common_open will unlock a still locked vnode.	2012-07-30 10:01:16 +00:00
Thomas Veerman	f6b0d662b5	VFS: check path components for NAME_MAX length	2012-07-30 09:44:58 +00:00
David van Moolenbroek	0b4c154160	VFS: call req_inhibread again	2012-07-19 14:36:51 +00:00
David van Moolenbroek	e0742978f1	VFS: do not resolve symlinks in rename(2)	2012-07-18 14:59:45 +00:00
Thomas Veerman	0d3ccd8908	VFS: fix coverity defects	2012-07-17 10:29:22 +00:00
Thomas Veerman	fd60f03129	VFS: remove support for sync FS communication	2012-07-17 10:12:53 +00:00
Thomas Veerman	06f49fe167	VFS: prevent buffer overflow If an FS returns faulty struct dirent data, VFS could overflow a buffer that holds this data.	2012-07-17 08:49:41 +00:00
Ben Gras	cbcdb838f1	various coverity-inspired fixes . some strncpy/strcpy to strlcpy conversions . new <minix/param.h> to avoid including other minix headers that have colliding definitions with library and commands code, causing parse warnings . removed some dead code / assignments	2012-07-16 14:00:56 +02:00
Thomas Veerman	77dbd766c1	VFS: Use safe string copy functions	2012-07-16 10:57:43 +00:00
Ben Gras	50e2064049	No more intel/minix segments. This commit removes all traces of Minix segments (the text/data/stack memory map abstraction in the kernel) and significance of Intel segments (hardware segments like CS, DS that add offsets to all addressing before page table translation). This ultimately simplifies the memory layout and addressing and makes the same layout possible on non-Intel architectures. There are only two types of addresses in the world now: virtual and physical; even the kernel and processes have the same virtual address space. Kernel and user processes can be distinguished at a glance as processes won't use 0xF0000000 and above. No static pre-allocated memory sizes exist any more. Changes to booting: . The pre_init.c leaves the kernel and modules exactly as they were left by the bootloader in physical memory . The kernel starts running using physical addressing, loaded at a fixed location given in its linker script by the bootloader. All code and data in this phase are linked to this fixed low location. . It makes a bootstrap pagetable to map itself to a fixed high location (also in linker script) and jumps to the high address. All code and data then use this high addressing. . All code/data symbols linked at the low addresses is prefixed by an objcopy step with __k_unpaged_, so that that code cannot reference highly-linked symbols (which aren't valid yet) or vice versa (symbols that aren't valid any more). . The two addressing modes are separated in the linker script by collecting the unpaged_.o objects and linking them with low addresses, and linking the rest high. Some objects are linked twice, once low and once high. . The bootstrap phase passes a lot of information (e.g. free memory list, physical location of the modules, etc.) using the kinfo struct. . After this bootstrap the low-linked part is freed. . The kernel maps in VM into the bootstrap page table so that VM can begin executing. Its first job is to make page tables for all other boot processes. So VM runs before RS, and RS gets a fully dynamic, VM-managed address space. VM gets its privilege info from RS as usual but that happens after RS starts running. . Both the kernel loading VM and VM organizing boot processes happen using the libexec logic. This removes the last reason for VM to still know much about exec() and vm/exec.c is gone. Further Implementation: . All segments are based at 0 and have a 4 GB limit. . The kernel is mapped in at the top of the virtual address space so as not to constrain the user processes. . Processes do not use segments from the LDT at all; there are no segments in the LDT any more, so no LLDT is needed. . The Minix segments T/D/S are gone and so none of the user-space or in-kernel copy functions use them. The copy functions use a process endpoint of NONE to realize it's a physical address, virtual otherwise. . The umap call only makes sense to translate a virtual address to a physical address now. . Segments-related calls like newmap and alloc_segments are gone. . All segments-related translation in VM is gone (vir2map etc). . Initialization in VM is simpler as no moving around is necessary. . VM and all other boot processes can be linked wherever they wish and will be mapped in at the right location by the kernel and VM respectively. Other changes: . The multiboot code is less special: it does not use mb_print for its diagnostics any more but uses printf() as normal, saving the output into the diagnostics buffer, only printing to the screen using the direct print functions if a panic() occurs. . The multiboot code uses the flexible 'free memory map list' style to receive the list of free memory if available. . The kernel determines the memory layout of the processes to a degree: it tells VM where the kernel starts and ends and where the kernel wants the top of the process to be. VM then uses this entire range, i.e. the stack is right at the top, and mmap()ped bits of memory are placed below that downwards, and the break grows upwards. Other Consequences: . Every process gets its own page table as address spaces can't be separated any more by segments. . As all segments are 0-based, there is no distinction between virtual and linear addresses, nor between userspace and kernel addresses. . Less work is done when context switching, leading to a net performance increase. (8% faster on my machine for 'make servers'.) . The layout and configuration of the GDT makes sysenter and syscall possible.	2012-07-15 22:30:15 +02:00
Ben Gras	0fb2f83da9	drop from segments physcopy/vircopy invocations . sys_vircopy always uses D for both src and dst . sys_physcopy uses PHYS_SEG if and only if corresponding endpoint is NONE, so we can derive the mode (PHYS_SEG or D) from the endpoint arg in the kernel, dropping the seg args . fields in msg still filled in for backwards compatability, using same NONE-logic in the library	2012-06-18 12:28:40 +00:00
Ben Gras	2bfeeed885	drop segment from safecopy invocations . all invocations were S or D, so can safely be dropped to prepare for the segmentless world . still assign D to the SCP_SEG field in the message to make previous kernels usable	2012-06-16 16:22:51 +00:00
Ben Gras	85ff5a947e	dumpcore: use ptrace function to trigger a coredump . dumpcore currently relies on minix segments . also ptrace dumpcore fix	2012-06-15 12:13:50 +02:00
Ben Gras	769af57274	further libexec generalization . new mode for sys_memset: include process so memset can be done in physical or virtual address space. . add a mode to mmap() that lets a process allocate uninitialized memory. . this allows an exec()er (RS, VFS, etc.) to request uninitialized memory from VM and selectively clear the ranges that don't come from a file, leaving no uninitialized memory left for the process to see. . use callbacks for clearing the process, clearing memory in the process, and copying into the process; so that the libexec code can be used from rs, vfs, and in the future, kernel (to load vm) and vm (to load boot-time processes)	2012-06-07 15:15:02 +02:00
Ben Gras	040362e379	exec() cleanup, generalization, improvement . make exec() callers (i.e. vfs and rs) determine the memory layout by explicitly reserving regions using mmap() calls on behalf of the exec()ing process, i.e. handling all of the exec logic, thereby eliminating all special exec() knowledge from VM. . the new procedure is: clear the exec()ing process first, then call third-party mmap()s to reserve memory, then copy the executable file section contents in, all using callbacks tailored to the caller's way of starting an executable . i.e. no more explicit EXEC_NEWMEM-style calls in PM or VM as with rigid 2-section arguments . this naturally allows generalizing exec() by simply loading all ELF sections . drop/merge of lots of duplicate exec() code into libexec . not copying the code sections to vfs and into the executable again is a measurable performance improvement (about 3.3% faster for 'make' in src/servers/)	2012-06-07 15:15:01 +02:00
Ben Gras	41b869d4d6	drop aout support justification: soon we won't be able to execute sep I&D aouts at all (because of the vanishing segments), which was the default mode to generate them so most binaries will be sep I&D. this makes the vfs/rs exec() unification work simpler. after unification, common I&D aout could be added back quite simply.	2012-06-07 12:43:16 +02:00
David van Moolenbroek	1817f7fc07	VFS: fix "process already free" panic on reboot Reported by Claudiu Dan Gheorghe, debugged by Thomas and myself	2012-05-02 17:42:50 +02:00
Thomas Veerman	068d443d12	VFS: unlock vmnt when out of vnodes	2012-04-27 08:51:13 +00:00
Thomas Veerman	b6ff38065f	VFS: release what can be released Only attempt to release blocked processes that are blocked. There is no use in trying to find more blocked processes than we know that are blocked (on a pipe).	2012-04-27 08:51:02 +00:00
Thomas Veerman	7b81254069	VFS: simplify stat for pipes According to POSIX the st_size field of struct stat is undefined for fifos and anonymous pipes. Thus we can do anything we want. We save a copy by not being accurate on pipe sizes.	2012-04-27 08:50:49 +00:00
Thomas Veerman	db8198d99d	VFS: use S_IS* macros	2012-04-27 08:49:38 +00:00
Thomas Veerman	96bbc5da3e	VFS: I_PIPE is redundant Also, use S_IS* macros instead of manual comparison.	2012-04-27 08:49:38 +00:00
Ben Gras	755102d67f	AT_SUN_EXECNAME support . vfs: pass execname in aux vectors . ld.elf_so: use this to expand $ORIGIN . this requires the executable to reserve more space at exec() calling time	2012-04-26 13:32:39 +02:00
David van Moolenbroek	26f817243b	VFS: reimplement truncate mtime/ctime fix POSIX mandates that a file's modification and change time be left untouched upon truncate/ftruncate iff the file size does not change. However, an open(O_TRUNC) call must always update the modification and change time of the file, even if it was already zero-sized. VFS uses the file systems' truncate call to implement O_TRUNC. This patch replaces git-255ae85, which did not take into account the open case. The size check is now moved into VFS, so that individual file systems need not check for this case anymore.	2012-04-20 11:35:59 +02:00
Ben Gras	3945cfbfd3	block ioctls: pass request number	2012-04-18 11:01:15 +02:00
Ben Gras	53002f6f6c	recognize and execute dynamically linked executables . generalize libexec slightly to get some more necessary information from ELF files, e.g. the interpreter . execute dynamically linked executables when exec()ed by VFS . switch to netbsd variant of elf32.h exclusively, solves some conflicting headers	2012-04-16 00:41:42 +00:00

1 2 3 4 5 ...

281 commits