summaryrefslogtreecommitdiffstats
path: root/Documentation/ia64/fsys.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/ia64/fsys.txt')
-rw-r--r--Documentation/ia64/fsys.txt286
1 files changed, 0 insertions, 286 deletions
diff --git a/Documentation/ia64/fsys.txt b/Documentation/ia64/fsys.txt
deleted file mode 100644
index 59dd689..0000000
--- a/Documentation/ia64/fsys.txt
+++ /dev/null
@@ -1,286 +0,0 @@
--*-Mode: outline-*-
-
- Light-weight System Calls for IA-64
- -----------------------------------
-
- Started: 13-Jan-2003
- Last update: 27-Sep-2003
-
- David Mosberger-Tang
- <davidm@hpl.hp.com>
-
-Using the "epc" instruction effectively introduces a new mode of
-execution to the ia64 linux kernel. We call this mode the
-"fsys-mode". To recap, the normal states of execution are:
-
- - kernel mode:
- Both the register stack and the memory stack have been
- switched over to kernel memory. The user-level state is saved
- in a pt-regs structure at the top of the kernel memory stack.
-
- - user mode:
- Both the register stack and the kernel stack are in
- user memory. The user-level state is contained in the
- CPU registers.
-
- - bank 0 interruption-handling mode:
- This is the non-interruptible state which all
- interruption-handlers start execution in. The user-level
- state remains in the CPU registers and some kernel state may
- be stored in bank 0 of registers r16-r31.
-
-In contrast, fsys-mode has the following special properties:
-
- - execution is at privilege level 0 (most-privileged)
-
- - CPU registers may contain a mixture of user-level and kernel-level
- state (it is the responsibility of the kernel to ensure that no
- security-sensitive kernel-level state is leaked back to
- user-level)
-
- - execution is interruptible and preemptible (an fsys-mode handler
- can disable interrupts and avoid all other interruption-sources
- to avoid preemption)
-
- - neither the memory-stack nor the register-stack can be trusted while
- in fsys-mode (they point to the user-level stacks, which may
- be invalid, or completely bogus addresses)
-
-In summary, fsys-mode is much more similar to running in user-mode
-than it is to running in kernel-mode. Of course, given that the
-privilege level is at level 0, this means that fsys-mode requires some
-care (see below).
-
-
-* How to tell fsys-mode
-
-Linux operates in fsys-mode when (a) the privilege level is 0 (most
-privileged) and (b) the stacks have NOT been switched to kernel memory
-yet. For convenience, the header file <asm-ia64/ptrace.h> provides
-three macros:
-
- user_mode(regs)
- user_stack(task,regs)
- fsys_mode(task,regs)
-
-The "regs" argument is a pointer to a pt_regs structure. The "task"
-argument is a pointer to the task structure to which the "regs"
-pointer belongs to. user_mode() returns TRUE if the CPU state pointed
-to by "regs" was executing in user mode (privilege level 3).
-user_stack() returns TRUE if the state pointed to by "regs" was
-executing on the user-level stack(s). Finally, fsys_mode() returns
-TRUE if the CPU state pointed to by "regs" was executing in fsys-mode.
-The fsys_mode() macro is equivalent to the expression:
-
- !user_mode(regs) && user_stack(task,regs)
-
-* How to write an fsyscall handler
-
-The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
-(fsyscall_table). This table contains one entry for each system call.
-By default, a system call is handled by fsys_fallback_syscall(). This
-routine takes care of entering (full) kernel mode and calling the
-normal Linux system call handler. For performance-critical system
-calls, it is possible to write a hand-tuned fsyscall_handler. For
-example, fsys.S contains fsys_getpid(), which is a hand-tuned version
-of the getpid() system call.
-
-The entry and exit-state of an fsyscall handler is as follows:
-
-** Machine state on entry to fsyscall handler:
-
- - r10 = 0
- - r11 = saved ar.pfs (a user-level value)
- - r15 = system call number
- - r16 = "current" task pointer (in normal kernel-mode, this is in r13)
- - r32-r39 = system call arguments
- - b6 = return address (a user-level value)
- - ar.pfs = previous frame-state (a user-level value)
- - PSR.be = cleared to zero (i.e., little-endian byte order is in effect)
- - all other registers may contain values passed in from user-mode
-
-** Required machine state on exit to fsyscall handler:
-
- - r11 = saved ar.pfs (as passed into the fsyscall handler)
- - r15 = system call number (as passed into the fsyscall handler)
- - r32-r39 = system call arguments (as passed into the fsyscall handler)
- - b6 = return address (as passed into the fsyscall handler)
- - ar.pfs = previous frame-state (as passed into the fsyscall handler)
-
-Fsyscall handlers can execute with very little overhead, but with that
-speed comes a set of restrictions:
-
- o Fsyscall-handlers MUST check for any pending work in the flags
- member of the thread-info structure and if any of the
- TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
- doing a full system call (by calling fsys_fallback_syscall).
-
- o Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
- r15, b6, and ar.pfs) because they will be needed in case of a
- system call restart. Of course, all "preserved" registers also
- must be preserved, in accordance to the normal calling conventions.
-
- o Fsyscall-handlers MUST check argument registers for containing a
- NaT value before using them in any way that could trigger a
- NaT-consumption fault. If a system call argument is found to
- contain a NaT value, an fsyscall-handler may return immediately
- with r8=EINVAL, r10=-1.
-
- o Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
- any other operation that would trigger mandatory RSE
- (register-stack engine) traffic.
-
- o Fsyscall-handlers MUST NOT write to any stacked registers because
- it is not safe to assume that user-level called a handler with the
- proper number of arguments.
-
- o Fsyscall-handlers need to be careful when accessing per-CPU variables:
- unless proper safe-guards are taken (e.g., interruptions are avoided),
- execution may be pre-empted and resumed on another CPU at any given
- time.
-
- o Fsyscall-handlers must be careful not to leak sensitive kernel'
- information back to user-level. In particular, before returning to
- user-level, care needs to be taken to clear any scratch registers
- that could contain sensitive information (note that the current
- task pointer is not considered sensitive: it's already exposed
- through ar.k6).
-
- o Fsyscall-handlers MUST NOT access user-memory without first
- validating access-permission (this can be done typically via
- probe.r.fault and/or probe.w.fault) and without guarding against
- memory access exceptions (this can be done with the EX() macros
- defined by asmmacro.h).
-
-The above restrictions may seem draconian, but remember that it's
-possible to trade off some of the restrictions by paying a slightly
-higher overhead. For example, if an fsyscall-handler could benefit
-from the shadow register bank, it could temporarily disable PSR.i and
-PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
-needed. In other words, following the above rules yields extremely
-fast system call execution (while fully preserving system call
-semantics), but there is also a lot of flexibility in handling more
-complicated cases.
-
-* Signal handling
-
-The delivery of (asynchronous) signals must be delayed until fsys-mode
-is exited. This is accomplished with the help of the lower-privilege
-transfer trap: arch/ia64/kernel/process.c:do_notify_resume_user()
-checks whether the interrupted task was in fsys-mode and, if so, sets
-PSR.lp and returns immediately. When fsys-mode is exited via the
-"br.ret" instruction that lowers the privilege level, a trap will
-occur. The trap handler clears PSR.lp again and returns immediately.
-The kernel exit path then checks for and delivers any pending signals.
-
-* PSR Handling
-
-The "epc" instruction doesn't change the contents of PSR at all. This
-is in contrast to a regular interruption, which clears almost all
-bits. Because of that, some care needs to be taken to ensure things
-work as expected. The following discussion describes how each PSR bit
-is handled.
-
-PSR.be Cleared when entering fsys-mode. A srlz.d instruction is used
- to ensure the CPU is in little-endian mode before the first
- load/store instruction is executed. PSR.be is normally NOT
- restored upon return from an fsys-mode handler. In other
- words, user-level code must not rely on PSR.be being preserved
- across a system call.
-PSR.up Unchanged.
-PSR.ac Unchanged.
-PSR.mfl Unchanged. Note: fsys-mode handlers must not write-registers!
-PSR.mfh Unchanged. Note: fsys-mode handlers must not write-registers!
-PSR.ic Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
-PSR.i Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
-PSR.pk Unchanged.
-PSR.dt Unchanged.
-PSR.dfl Unchanged. Note: fsys-mode handlers must not write-registers!
-PSR.dfh Unchanged. Note: fsys-mode handlers must not write-registers!
-PSR.sp Unchanged.
-PSR.pp Unchanged.
-PSR.di Unchanged.
-PSR.si Unchanged.
-PSR.db Unchanged. The kernel prevents user-level from setting a hardware
- breakpoint that triggers at any privilege level other than 3 (user-mode).
-PSR.lp Unchanged.
-PSR.tb Lazy redirect. If a taken-branch trap occurs while in
- fsys-mode, the trap-handler modifies the saved machine state
- such that execution resumes in the gate page at
- syscall_via_break(), with privilege level 3. Note: the
- taken branch would occur on the branch invoking the
- fsyscall-handler, at which point, by definition, a syscall
- restart is still safe. If the system call number is invalid,
- the fsys-mode handler will return directly to user-level. This
- return will trigger a taken-branch trap, but since the trap is
- taken _after_ restoring the privilege level, the CPU has already
- left fsys-mode, so no special treatment is needed.
-PSR.rt Unchanged.
-PSR.cpl Cleared to 0.
-PSR.is Unchanged (guaranteed to be 0 on entry to the gate page).
-PSR.mc Unchanged.
-PSR.it Unchanged (guaranteed to be 1).
-PSR.id Unchanged. Note: the ia64 linux kernel never sets this bit.
-PSR.da Unchanged. Note: the ia64 linux kernel never sets this bit.
-PSR.dd Unchanged. Note: the ia64 linux kernel never sets this bit.
-PSR.ss Lazy redirect. If set, "epc" will cause a Single Step Trap to
- be taken. The trap handler then modifies the saved machine
- state such that execution resumes in the gate page at
- syscall_via_break(), with privilege level 3.
-PSR.ri Unchanged.
-PSR.ed Unchanged. Note: This bit could only have an effect if an fsys-mode
- handler performed a speculative load that gets NaTted. If so, this
- would be the normal & expected behavior, so no special treatment is
- needed.
-PSR.bn Unchanged. Note: fsys-mode handlers may clear the bit, if needed.
- Doing so requires clearing PSR.i and PSR.ic as well.
-PSR.ia Unchanged. Note: the ia64 linux kernel never sets this bit.
-
-* Using fast system calls
-
-To use fast system calls, userspace applications need simply call
-__kernel_syscall_via_epc(). For example
-
--- example fgettimeofday() call --
--- fgettimeofday.S --
-
-#include <asm/asmmacro.h>
-
-GLOBAL_ENTRY(fgettimeofday)
-.prologue
-.save ar.pfs, r11
-mov r11 = ar.pfs
-.body
-
-mov r2 = 0xa000000000020660;; // gate address
- // found by inspection of System.map for the
- // __kernel_syscall_via_epc() function. See
- // below for how to do this for real.
-
-mov b7 = r2
-mov r15 = 1087 // gettimeofday syscall
-;;
-br.call.sptk.many b6 = b7
-;;
-
-.restore sp
-
-mov ar.pfs = r11
-br.ret.sptk.many rp;; // return to caller
-END(fgettimeofday)
-
--- end fgettimeofday.S --
-
-In reality, getting the gate address is accomplished by two extra
-values passed via the ELF auxiliary vector (include/asm-ia64/elf.h)
-
- o AT_SYSINFO : is the address of __kernel_syscall_via_epc()
- o AT_SYSINFO_EHDR : is the address of the kernel gate ELF DSO
-
-The ELF DSO is a pre-linked library that is mapped in by the kernel at
-the gate page. It is a proper ELF shared object so, with a dynamic
-loader that recognises the library, you should be able to make calls to
-the exported functions within it as with any other shared library.
-AT_SYSINFO points into the kernel DSO at the
-__kernel_syscall_via_epc() function for historical reasons (it was
-used before the kernel DSO) and as a convenience.