Processes are a core concept in Linux. This section is about what processes are and how they are implemented in Linux.
- The lifecycle of a process
- The task structure
- Process creation
- Process termination
A process is an executing instance of a program, which is object code stored on some media. There can be multiple processes of the same program running at the same time [1, P. 23].
A process includes resources like open files, processor state, a memory address space, one or more threads of execution, and a data section containing global variables [1, P. 23].
Threads (threads of execution) are the parts of a process that contain the program counter, process stack, and processor registers. The kernel schedules threads rather than processes. “Linux does not differentiate between threads and processes. To Linux a thread is just a special kind of process” [1, P. 23].
Processes use a virtual processor and virtual memory. The virtual processor gives the illusion that a process is the only process running on a processor, despite potentially thousands of other processes sharing the processor. Virtual memory is another illusion. As far as a process is aware, it has exclusive access to memory, but in reality the kernel is managing memory access. Threads share memory with each other, but each one has their own virtual processor [1, P. 23].
A process is created using the
fork() system call. The new process that’s created, the child process, inherits from the parent process that was active when
fork() was called. The parent process resumes execution after the call to
fork(), and the child starts execution at the same position. Internally
fork() uses the
clone() system call [1, Pp. 23-4].
exec() system call family “creates a new address space and loads a new program into it” [1, P. 24].
exit system call terminates a process and frees its resources. An exited process becomes a zombie process representing a terminated process until the parent process calls
waitpid() [1, P. 24].
Internally, a process is referred to as a task. A task is represented as a
task_struct in the kernel. Each
task_struct is stored in a circular doubly linked list called the task list [1, P. 24].
task_struct is large, at around 1.7kb on a 32 bit machine (in version 2.6).
task_struct structs are allocated by the slab allocator for efficient object reuse [1, P. 24].
Until the 2.6 kernel the
task_struct was stored at the end of the kernel stack of a process. This made it possible to access the task struct using the stack pointer. Now that the slab allocator manages the
task_struct there is a
thread_info struct which performs the same function, either stored at the bottom of the stack for stacks that grow down, or the top of the stack for stacks that grow up [1, P. 24].
Each process is given a unique process identifier, or PID. A PID is a numerical value of type
pid_t (usually an
int). The default maximum value is 32,768 for backwards compatibility with older Linux versions, but it can be set up to 4,194,304 by editing the value in /proc/sys/kernel/pid_max. The maximum value is important because it limits the number of processes that can exist at once [1, P. 26].
Most kernel code deals with the
task_struct directly, so it’s important to be able quickly access a process. The currently executing process can be accessed using the
current macro. The
current macro is architecture specific, because different architectures store the
thread_info structure at different locations [1, P. 27].
Some architectures store a pointer to the current
thread_info structure in a register, so the
current macro on those architectures just returns the value from the register. Register-constrained architectures like x86 store the process on the stack. Accessing it on an x86 is done by masking out the 13 least significant bits of the stack pointer, assuming the stack size is 8KB:
movl $-8192, %eax andl %esp,%eax
current then deferences the
task member of
thread_info to return a pointer to the current
state field describes the current condition of the process. Each process will be in one of five possible states:
TASK_RUNNING means the process is either currently running or on a run queue waiting to run.
TASK_INTERRUPTIBLE means the process is waiting for some condition to exist. When the condition exists, the kernel sets the state to
TASK_RUNNING. The process will awake prematurely if it receives a signal.
TASK_UNINTERRUPTABLE is the same as
TASK_INTERUPTIBLE except the process will not awake prematurely if it receives a signal.
__TASK_TRACED means the process is being traced by a program like ptrace.
__TASK_STOPPED means the process is no longer being executed. “This occurs when the task receives the
SIGTTOU signal or if it receives any signal while it is being debugged” [1, Pp. 27-28].
The kernel can change a process’s state with the
The kernel can change the current process state with the
Normally a process runs in user-space, but when the process executes a system call the processor enters kernel-space, where the kernel executes on behalf of the process. At this point the kernel is running in process context (as opposed to interrupt context). In process context, the
current macro is valid [1, P. 29].
“System calls and exception handlers are well-defined interfaces into the kernel. A process can begin executing in kernel-space only through one of these interfaces—all access to the kernel is through these interfaces” [1, P. 29].
All processes are descendants of the
init process. The
init process is started by the kernel as the last step in the boot process [1, P. 29].
Every process has one parent, and every process has zero or more children. Processes that are children of the same parent are called sibling processes. Each
task_struct has a
parent field with a reference to the parent, and a
children field which is a linked list that contains the children [1, P. 30].
task_struct is statically allocated as
init_task [1, P. 30].
When a task is created it’s added to the task list, you can access the previous and next item on the
list_entry(task->tasks.next, struct task_struct, tasks); list_entry(task->tasks.prev, struct task_struct, tasks);
You can iterate over the entire task list using the
for_each_process() macro, although it’s an expensive O(n) operation [1, P. 30].
You create a task using the
fork() and the
exec() family calls.
fork() clones the current task, updating the PID and some other values.
exec() loads a new executable into the address space and begins executing it [1, P. 31].
A naive approach to
fork() would be to copy all resources for a new process. This is expensive and often unnecessary because many resources are never rewritten and can be shared as read-only. Linux follows copy-on-write where copying of the address space only happens if the data is rewritten. Linux marks pages so that when a page is written to, it’s copied and each process receives a unique page containing the data. This makes process creation fast because the kernel only has to change a few values, rather than copying over a potentially large address space [1, P. 31].
As mentioned previously,
fork() is implemented using the
clone() system call.
clone() accepts flags that tell it what resources should be shared between the parent and child processes. Clone then calls
copy_process() which then does most of the work.
dup_task_struct()to create a new stack,
task_structfor the new process.
- Ensures child process doesn’t exceed the limit on the number of processes for the current user.
task_structfields that are unique to process are either cleared or reset. Most values in
- The new process’s
statefield is set to
TASK_UNINTERRUPTABLEto ensure it doesn’t run yet.
copy_flagsis called to update the
flagsmember of the new
- Assigns a new pid with
- Depending on flags passed to
copy_process()either duplicates or shares resources like open file descriptors.
copy_process()returns a pointer to the new
do_fork() the child process is then run before the parent process. In the common case the child calls
exec() immediately, so running the child first eliminates any overhead from copy-on-write that might occur if the parent ran first [1, Pp. 31-2].
Threads are an abstraction that provide multiple execution threads in the same shared memory address space. Threads enable true parallelism on multiple processor machines.
In the kernel a thread is simply a
task_struct that shares certain resources, like an address space, with other processes [1, P. 36].
Threads are created in the same way as normal tasks, except the
clone() system call is called with flags for the shared resources:
clone(CLONE_VM, CLONE_FS, CLONE_FILES, CLONE_SIGHAND, 0);
This creates a cloned process that shares address space, file system resources, file descriptors, and signal handlers [1, P. 36].
Kernel threads are processes that exist only in kernel-space. “The significant difference between kernel threads and normal processes is that kernel threads do not have an address space” [1, P. 36]. Kernel threads are scheduled and preemptable like normal processes [1, P. 36].
Kernel threads can only be created by other kernel threads. They are used for system tasks like the flush task. All new kernel threads are forked from the
kthreadd kernel process. You can create a kernel thread with the
kthread_create() function. The thread will be created in an unrunnable state. You can create and start a kernel thread with
kthread_run(). Once started a kernel thread continues to exist until either it calls
do_exit() or another thread calls
kthread_stop() [1, P. 36].
When a process dies, the kernel frees the resources owned by the thread and notifies the process’s parent that the process has died. A process can terminate itself by calling
exit() either explicitly, or implicitly by returning from a
main() function. A process can also terminate in response to a signal or exception that it can’t handle or ignore [1, P. 36].
When a process terminates, most of the work is done by
- Sets the
flagsmember of exiting task struct.
del_timer_sync()to remove kernel timers.
- If BSD process accounting is enabled then calls
acct_update_integrals()to write out accounting information
mm_structheld by the process.
mm_structis destroyed if no other process is using it.
exit_sem(). If process is queued waiting for an IPC semaphore, it’s dequeued.
exit_fs()to decrement the usage count of objects related to file descriptors and filesystem data. If either usage reaches zero the object is no longer in use and is destroyed.
- Sets the tasks exit code, stored in the
exit_notify()to send signals to parent process. Reparents any of the tasks children, and sets the tasks exit state to
schedule()to switch to a new process. Since the process is now unschedulable, it never runs again
task_structexists in a zombie state. After the parent has obtained information about the child the
wait() family of system calls are implemented by a system call
wait() calls stops the execution of a task until one of its children terminates, and the function returns with the PID of the terminated child that exited [1, P. 37].
task_struct is ready to be released,
release_task() is called.
- Detaches the pid and removes the task from the task list
- Releases any remaining resources that were used by the process and finalizes statistics
- If task was the last member of a thread group and the leader is a zombie,
release_tasknotifies the zombie leader’s parent
- Frees the pages containing the process stack and
thread_infoand deallocates the
If a child process’s parent dies before the child, then the child process needs to be reassigned to another process. This task is called reparenting. The new parent is either another task in the thread group, or the
init process. The
init process regularly calls
wait() on its children to clean up any zombies processes that were assigned to it [1, P. 40].
The process is a core operating system abstraction.
Linux represents processes with the
task_struct struct, and the
thread_info struct. In Linux, threads are simply a task that share resources with other tasks.
-  L. R., Linux Kernel Development (Developer’s Library), 3rd ed. Addison-Wesley Professional, 2010.