进程状态 - W10N

“进程状态”

进程的状态 linux (本文使用linux4.8.4) 下，进程状态大致有7种。

进程状态说明 TASK_RUNNING 可运行状态。未必正在使用CPU，也许是在等待调度 TASK_INTERRUPTIBLE 可中断的睡眠状态。正在等待某个条件满足 TASK_UNINTERRUPTIBLE 不可中断的睡眠状态。不会被信号中断 __TASK_STOPPED 暂停状态。收到某种信号，运行被停止 __TASK_TRACED 被跟踪状态。进程停止，被另一个进程跟踪 EXIT_ZOMBIE 僵尸状态。进程已经退出，但尚未被父进程或者init进程收尸 EXIT_DEAD 真正的死亡状态在include/linux/sched.h中，进程状态的定义并没有那么少:

Task state bitmask. NOTE! These bits are also
encoded in fs/proc/array.c: get_task_state().
We have two separate sets of flags: task->state
is about runnability, while task->exit_state are
about the task exiting. Confusing, but this way
modifying one set can’t modify the other one by
mistake. / #define TASK_RUNNING 0 #define TASK_INTERRUPTIBLE 1 #define TASK_UNINTERRUPTIBLE 2 #define __TASK_STOPPED 4 #define __TASK_TRACED 8 / in tsk->exit_state / #define EXIT_DEAD 16 #define EXIT_ZOMBIE 32 #define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD) / in tsk->state again */ #define TASK_DEAD 64 #define TASK_WAKEKILL 128 #define TASK_WAKING 256 #define TASK_PARKED 512 #define TASK_NOLOAD 1024 #define TASK_NEW 2048 #define TASK_STATE_MAX 4096

#define TASK_STATE_TO_CHAR_STR “RSDTtXZxKWPNn”

extern char ___assert_task_state[1 - 2*!!( sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];

/* Convenience macros for the sake of set_task_state */ #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) #define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)

#define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD)

/* Convenience macros for the sake of wake_up */ #define TASK_NORMAL (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE) #define TASK_ALL (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED)

/* get_task_state() */ #define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE |
TASK_UNINTERRUPTIBLE | __TASK_STOPPED |
__TASK_TRACED | EXIT_ZOMBIE | EXIT_DEAD) 2 TASK_RUNNING TASK_RUNNING是教科书中两种状态的结合，一种是正在占用CPU事件的RUNNING状态，一种是RUNNING状态的进程时间片耗尽或者主动让出CPU，或者被更高优先级进程抢占后，进入的READY状态。处于TASK_RUNNING状态的进程要么正在CPU上运行，要么随时都可以投入运行，只不过CPU资源有限，调度器暂时没有选中他们。

处于TASK_RUNNING状态的进程是调度器的调度对象。在linux中，每个CPU都有自己的运行队列集合。如果是实时进程，则根据优先级的情况落在相应的优先级的队列上；如果是普通进程，则根据虚拟运行时间，落在红黑树相应位置上。

Linux提供了time命令可以统计进程在用户态和内核态消耗的CPU时间。time命令提供了三种事件: 实际时间，用户CPU时间和内核CPU时间。下面的输出可以看出 r e a l ≠ u s e r + s y s 。在多核处理器上，两边的大小是不确定的。

[root@localhost ~]# time ntpdate pool.ntp.org xxx xxxxxx outputs of ntpdate xxx xxxxxx

real 0m8.710s user 0m0.002s sys 0m0.013s 如果想在进程尚未结束时获得程序的执行时间，可以空过procfs中的信息，/proc//stat中字段13是用户态CPU时间，14是内核态CPU时间，两者单位是始终嘀嗒。在配置内核的时候，有100HZ，250HZ，300HZ和1000HZ这4个选项。一个始终嘀嗒的事件可以通过下面的命令获得:

[root@localhost ~]# grep CONFIG_HZ /boot/config-* pidstat命令也可以获取各个进程的CPU使用情况。如果想获取进程的实际运行时间，可以使用ps命令:

[] ~ ps -p 20590 -o etime,cmd,pid ELAPSED CMD PID 01:21:57 emacs taskstatus.org 20590 3 TASK_INTERRUPTIBLE 和 TASK_UNINTERRUPTIBLE 当进程和慢速设备打交道，或者需要等待条件满足时，这种等待时间是不可预估的，这种情况下，内核会将该进程从CPU的运行队列中移除，从而进程进入睡眠状态。

Linux的进程有两种睡眠状态: TASK_INTERRUPTIBLE和TASK_UNINTERRUPTIBLE，这两种状态的区别是能否响应收到的信号。处于TASK_INTERRUPTIBLE状态的进程遇到下面两种情况会返回到TASK_RUNNING状态:

等待条件满足；收到未被屏蔽的信号。收到信号时，会返回EINTR，需要检测返回值以作出正确处理。对于TASK_UNINTERRUPTIBLE，只有等待条件满足才有可能返回运行状态，任何信号都无法打断它。如果这种状态的进程出错，无法杀死，只能重启。

TASK_UNINTERRUPTIBLE的存在是因为内核中某些处理是不能被打断的，比如read系统调用正在操作磁盘，就要用TASK_UNINTERRUPTIBLE将其保护起来以免受到打扰而陷入不可控的状态。

khungtaskd内核线程(源码在kernel/hung_task.c)会定期唤醒(120秒)检查所有 TASK_UNINTERRUPTIBLE进程，如果有进程超过120秒没有被调度，那么内核就会打印进程的堆栈信息。通过下面的命令可以查看kungtaskd周期:

[root@localhost ~]# sysctl kernel.hung_task_timeout_secs kernel.hung_task_timeout_secs = 120 通过/proc//wchan (what channel的缩写) 或者 proc//stack，或者 /proc//status 可以知道进程处于什么状态。

睡眠状态的进程都保存在等待队列中。队列在include/linux/wait.h中定义。

typedef struct __wait_queue wait_queue_t; typedef int (*wait_queue_func_t)(wait_queue_t *wait, unsigned mode, int flags, void *key); int default_wake_function(wait_queue_t *wait, unsigned mode, int flags, void *key);

/* __wait_queue::flags */ #define WQ_FLAG_EXCLUSIVE 0x01 #define WQ_FLAG_WOKEN 0x02

struct __wait_queue { unsigned int flags; void *private; wait_queue_func_t func; // 唤醒回调函数 struct list_head task_list; };

struct wait_bit_key { void *flags; int bit_nr; #define WAIT_ATOMIC_T_BIT_NR -1 unsigned long timeout; };

struct wait_bit_queue { struct wait_bit_key key; wait_queue_t wait; };

struct __wait_queue_head { spinlock_t lock; struct list_head task_list; }; typedef struct __wait_queue_head wait_queue_head_t; 等待队列元素private在__WAITQUEUE_INITIALIZER中指向了进程描述符task_struct，这就可以将进程加入到对应的队列上了。使用add_wait_queue或者 add_wait_queue_exclusive将队列元素加到相应队列。这两个函数的区别在于:

一个将队列元素设置WQ_FLAG_EXCLUSIVE标志位，另一个没有；一个将元素放到队列尾部，另一个放到队列头部。这是因为有时候当等待条件满足，有时可以将队列中的所有进程唤醒，有时唤醒操作是排他的(EXCLUSIVE)则只能唤醒一个。

内核使用wait_event系列宏和函数等待条件是否满足。

#define ___wait_is_interruptible(state)
(!__builtin_constant_p(state) ||
state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \

The below macro ___wait_event() has an explicit shadow of the __ret
variable when used from the wait_event_*() macros.
This is so that both can use the ___wait_cond_timeout() construct
to wrap the condition.
The type inconsistency of the wait_event_*() __ret variable is also
on purpose; we use long where we can return timeout values and int
otherwise. */

#define ___wait_event(wq, condition, state, exclusive, ret, cmd)
({
label __out;
wait_queue_t __wait;
long __ret = ret; /* explicit shadow */

INIT_LIST_HEAD(&__wait.task_list);
if (exclusive)
__wait.flags = WQ_FLAG_EXCLUSIVE;
else
__wait.flags = 0;

for (;;) {
long __int = prepare_to_wait_event(&wq, &__wait, state);

if (condition)
break;

if (___wait_is_interruptible(state) && __int) {
__ret = __int;
if (exclusive) {
abort_exclusive_wait(&wq, &__wait,
state, NULL);
goto __out;
}
break;
}

cmd;
}
finish_wait(&wq, &__wait);
__out: __ret;
})

#define __wait_event(wq, condition)
(void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0,
schedule())

/**

wait_event - sleep until a condition gets true
@wq: the waitqueue to wait on
@condition: a C expression for the event to wait for
The process is put to sleep (TASK_UNINTERRUPTIBLE) until the
@condition evaluates to true. The @condition is checked each time
the waitqueue @wq is woken up.
wake_up() has to be called after changing any variable that could
change the result of the wait condition. */ #define wait_event(wq, condition)
do {
might_sleep();
if (condition)
break;
__wait_event(wq, condition);
} while (0) prepare_to_wait函数将队列元素添加到对应的等待队列，同时将进程状态设置成 TASK_UNINTERRUPTIBLE，完成prepare_to_wait后，检查条件是否满足，如果不满足则调用schedule()主动让出CPU使用权。prepare_to_wait在/kernel/sched/wait.c中。

内核是通过wake_up系列宏实现唤醒操作的。这些宏最终调用__wake_up函数。这个函数在kernel/sched/wait.c中wait_up最终调用try_to_wake_up。

The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just
wake everything up. If it’s an exclusive wakeup (nr_exclusive == small +ve
number) then we wake all the non-exclusive tasks and one exclusive task.
There are circumstances in which we can try to wake a task which has already
started to run but is not in state TASK_RUNNING. try_to_wake_up() returns

zero in this (rare) case, and we handle it by continuing to scan the queue. */ static void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int wake_flags, void *key) { wait_queue_t *curr, *next;

 list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
         unsigned flags = curr->flags;

         if (curr->func(curr, mode, wake_flags, key) &&
                         (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
                 break;
 }

}

/**

__wake_up - wake up threads blocked on a waitqueue.
@q: the waitqueue
@mode: which threads
@nr_exclusive: how many wake-one or wake-many threads to wake up
@key: is directly passed to the wakeup function
It may be assumed that this function implies a write memory barrier before
changing the task state if and only if any tasks are woken up. */ void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, void *key) { unsigned long flags;
```
 spin_lock_irqsave(&q->lock, flags);
 __wake_up_common(q, mode, nr_exclusive, 0, key);
 spin_unlock_irqrestore(&q->lock, flags);
```

} EXPORT_SYMBOL(__wake_up); 4 TASK_KILLABLE 有人认为使用vfork函数子进程在调用exec或者退出之前，父进程处于 TASK_UNINTERRUPTIBLE 状态，事实并非如此，因为进程可以轻易被Kill命令杀死。但是此时ps命令显示这个进程确实是D+状态。内核自2.6.25开始，引入了TASK_KILLABLE，处于TASK_UNINTERRUPTIBLE和TASK_INTERRUPTIBLE之间，进程收到致命信号SIGKILL时会被唤醒。

5 __TASK_STOPPED和__TASK_TRACED SIGSTOP、SIGTSTP、SIGTTIN、SIGTTOUT等信号会将进程暂时停止，进入__TASK_STOPPED 状态。这4种状态不可被忽略，不可被屏蔽，不能安装新的处理函数。在收到SIGCONT 后进程可以恢复执行。

使用gdb跟踪进程可以进入__TASK_TRACED状态。调试进程下达PTRACE_COUT或者 PTRACE_DETACH等可将其重新执行。

6 EXIT_ZOMBIE 和 EXIT_DEAD 这两种状态下面，进程已经死掉了，只是TASK_ZOMBIE状态中的进程没有被收尸，或者父进程没有设置SIGCHLD处理函数为SIG_IGN,或者为SIGCHLD设置SA_NOCLDWAIT标志位。

进程的状态可以在/proc//status中看到。对应关系如下。

procfs 进程状态 R(runnng) TASK_RUNNING S(sleeping) TASK_INTERRUPTIBLE D(disk sleeping) TASK_UNINTERRUPTIBLE T(stopped) __TASK_STOPPED t(tracing stop) __TASK_TRACED Z(zombie) EXIT_ZOMBIE X(dead) EXIT_DEAD

https://quant67.com/post/linux/taskstatus.html

Contents

“进程状态”