“进程状态”

进程的状态 linux (本文使用linux4.8.4) 下,进程状态大致有7种。

进程状态 说明 TASK_RUNNING 可运行状态。未必正在使用CPU,也许是在等待调度 TASK_INTERRUPTIBLE 可中断的睡眠状态。正在等待某个条件满足 TASK_UNINTERRUPTIBLE 不可中断的睡眠状态。不会被信号中断 __TASK_STOPPED 暂停状态。收到某种信号,运行被停止 __TASK_TRACED 被跟踪状态。进程停止,被另一个进程跟踪 EXIT_ZOMBIE 僵尸状态。进程已经退出,但尚未被父进程或者init进程收尸 EXIT_DEAD 真正的死亡状态 在include/linux/sched.h中,进程状态的定义并没有那么少:

/*

  • Task state bitmask. NOTE! These bits are also
  • encoded in fs/proc/array.c: get_task_state().
  • We have two separate sets of flags: task->state
  • is about runnability, while task->exit_state are
  • about the task exiting. Confusing, but this way
  • modifying one set can’t modify the other one by
  • mistake. / #define TASK_RUNNING 0 #define TASK_INTERRUPTIBLE 1 #define TASK_UNINTERRUPTIBLE 2 #define __TASK_STOPPED 4 #define __TASK_TRACED 8 / in tsk->exit_state / #define EXIT_DEAD 16 #define EXIT_ZOMBIE 32 #define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD) / in tsk->state again */ #define TASK_DEAD 64 #define TASK_WAKEKILL 128 #define TASK_WAKING 256 #define TASK_PARKED 512 #define TASK_NOLOAD 1024 #define TASK_NEW 2048 #define TASK_STATE_MAX 4096

#define TASK_STATE_TO_CHAR_STR “RSDTtXZxKWPNn”

extern char ___assert_task_state[1 - 2*!!( sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)];

/* Convenience macros for the sake of set_task_state */ #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) #define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)

#define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD)

/* Convenience macros for the sake of wake_up */ #define TASK_NORMAL (TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE) #define TASK_ALL (TASK_NORMAL | __TASK_STOPPED | __TASK_TRACED)

/* get_task_state() */ #define TASK_REPORT (TASK_RUNNING | TASK_INTERRUPTIBLE |
TASK_UNINTERRUPTIBLE | __TASK_STOPPED |
__TASK_TRACED | EXIT_ZOMBIE | EXIT_DEAD) 2 TASK_RUNNING TASK_RUNNING是教科书中两种状态的结合,一种是正在占用CPU事件的RUNNING状态, 一种是RUNNING状态的进程时间片耗尽或者主动让出CPU,或者被更高优先级进程抢占 后,进入的READY状态。处于TASK_RUNNING状态的进程要么正在CPU上运行,要么 随时都可以投入运行,只不过CPU资源有限,调度器暂时没有选中他们。

处于TASK_RUNNING状态的进程是调度器的调度对象。在linux中,每个CPU都有自己的 运行队列集合。如果是实时进程,则根据优先级的情况落在相应的优先级的队列上; 如果是普通进程,则根据虚拟运行时间,落在红黑树相应位置上。

Linux提供了time命令可以统计进程在用户态和内核态消耗的CPU时间。time命令提供了 三种事件: 实际时间,用户CPU时间和内核CPU时间。下面的输出可以看出 r e a l ≠ u s e r + s y s 。 在多核处理器上,两边的大小是不确定的。

[root@localhost ~]# time ntpdate pool.ntp.org xxx xxxxxx outputs of ntpdate xxx xxxxxx

real 0m8.710s user 0m0.002s sys 0m0.013s 如果想在进程尚未结束时获得程序的执行时间,可以空过procfs中的信息,/proc//stat中 字段13是用户态CPU时间,14是内核态CPU时间,两者单位是始终嘀嗒。在配置内核的时候, 有100HZ,250HZ,300HZ和1000HZ这4个选项。一个始终嘀嗒的事件可以通过下面的命令获得:

[root@localhost ~]# grep CONFIG_HZ /boot/config-* pidstat命令也可以获取各个进程的CPU使用情况。如果想获取进程的实际运行时间,可以 使用ps命令:

[] ~ ps -p 20590 -o etime,cmd,pid ELAPSED CMD PID 01:21:57 emacs taskstatus.org 20590 3 TASK_INTERRUPTIBLE 和 TASK_UNINTERRUPTIBLE 当进程和慢速设备打交道,或者需要等待条件满足时,这种等待时间是不可预估的, 这种情况下,内核会将该进程从CPU的运行队列中移除,从而进程进入睡眠状态。

Linux的进程有两种睡眠状态: TASK_INTERRUPTIBLE和TASK_UNINTERRUPTIBLE,这两种 状态的区别是能否响应收到的信号。处于TASK_INTERRUPTIBLE状态的进程遇到下面两种 情况会返回到TASK_RUNNING状态:

等待条件满足; 收到未被屏蔽的信号。 收到信号时,会返回EINTR,需要检测返回值以作出正确处理。 对于TASK_UNINTERRUPTIBLE,只有等待条件满足才有可能返回运行状态,任何信号 都无法打断它。如果这种状态的进程出错,无法杀死,只能重启。

TASK_UNINTERRUPTIBLE的存在是因为内核中某些处理是不能被打断的,比如read系统 调用正在操作磁盘,就要用TASK_UNINTERRUPTIBLE将其保护起来以免受到打扰而陷入不 可控的状态。

khungtaskd内核线程(源码在kernel/hung_task.c)会定期唤醒(120秒)检查所有 TASK_UNINTERRUPTIBLE进程,如果有进程超过120秒没有被调度,那么内核就会打印进 程的堆栈信息。通过下面的命令可以查看kungtaskd周期:

[root@localhost ~]# sysctl kernel.hung_task_timeout_secs kernel.hung_task_timeout_secs = 120 通过/proc//wchan (what channel的缩写) 或者 proc//stack,或者 /proc//status 可以知道进程处于什么状态。

睡眠状态的进程都保存在等待队列中。队列在include/linux/wait.h中定义。

typedef struct __wait_queue wait_queue_t; typedef int (*wait_queue_func_t)(wait_queue_t *wait, unsigned mode, int flags, void *key); int default_wake_function(wait_queue_t *wait, unsigned mode, int flags, void *key);

/* __wait_queue::flags */ #define WQ_FLAG_EXCLUSIVE 0x01 #define WQ_FLAG_WOKEN 0x02

struct __wait_queue { unsigned int flags; void *private; wait_queue_func_t func; // 唤醒回调函数 struct list_head task_list; };

struct wait_bit_key { void *flags; int bit_nr; #define WAIT_ATOMIC_T_BIT_NR -1 unsigned long timeout; };

struct wait_bit_queue { struct wait_bit_key key; wait_queue_t wait; };

struct __wait_queue_head { spinlock_t lock; struct list_head task_list; }; typedef struct __wait_queue_head wait_queue_head_t; 等待队列元素private在__WAITQUEUE_INITIALIZER中指向了进程描述符task_struct, 这就可以将进程加入到对应的队列上了。使用add_wait_queue或者 add_wait_queue_exclusive将队列元素加到相应队列。这两个函数的区别在于:

一个将队列元素设置WQ_FLAG_EXCLUSIVE标志位,另一个没有; 一个将元素放到队列尾部,另一个放到队列头部。 这是因为有时候当等待条件满足,有时可以将队列中的所有进程唤醒,有时唤醒操作 是排他的(EXCLUSIVE)则只能唤醒一个。

内核使用wait_event系列宏和函数等待条件是否满足。

#define ___wait_is_interruptible(state)
(!__builtin_constant_p(state) ||
state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \

/*

  • The below macro ___wait_event() has an explicit shadow of the __ret
  • variable when used from the wait_event_*() macros.
  • This is so that both can use the ___wait_cond_timeout() construct
  • to wrap the condition.
  • The type inconsistency of the wait_event_*() __ret variable is also
  • on purpose; we use long where we can return timeout values and int
  • otherwise. */

#define ___wait_event(wq, condition, state, exclusive, ret, cmd)
({
label __out;
wait_queue_t __wait;
long __ret = ret; /* explicit shadow */

INIT_LIST_HEAD(&__wait.task_list);
if (exclusive)
__wait.flags = WQ_FLAG_EXCLUSIVE;
else
__wait.flags = 0;

for (;;) {
long __int = prepare_to_wait_event(&wq, &__wait, state);

if (condition)
break;

if (___wait_is_interruptible(state) && __int) {
__ret = __int;
if (exclusive) {
abort_exclusive_wait(&wq, &__wait,
state, NULL);
goto __out;
}
break;
}

cmd;
}
finish_wait(&wq, &__wait);
__out: __ret;
})

#define __wait_event(wq, condition)
(void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0,
schedule())

/**

  • wait_event - sleep until a condition gets true
  • @wq: the waitqueue to wait on
  • @condition: a C expression for the event to wait for
  • The process is put to sleep (TASK_UNINTERRUPTIBLE) until the
  • @condition evaluates to true. The @condition is checked each time
  • the waitqueue @wq is woken up.
  • wake_up() has to be called after changing any variable that could
  • change the result of the wait condition. */ #define wait_event(wq, condition)
    do {
    might_sleep();
    if (condition)
    break;
    __wait_event(wq, condition);
    } while (0) prepare_to_wait函数将队列元素添加到对应的等待队列,同时将进程状态设置成 TASK_UNINTERRUPTIBLE,完成prepare_to_wait后,检查条件是否满足,如果不满足则 调用schedule()主动让出CPU使用权。prepare_to_wait在/kernel/sched/wait.c中。

内核是通过wake_up系列宏实现唤醒操作的。这些宏最终调用__wake_up函数。这个函 数在kernel/sched/wait.c中wait_up最终调用try_to_wake_up。

/*

  • The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just

  • wake everything up. If it’s an exclusive wakeup (nr_exclusive == small +ve

  • number) then we wake all the non-exclusive tasks and one exclusive task.

  • There are circumstances in which we can try to wake a task which has already

  • started to run but is not in state TASK_RUNNING. try_to_wake_up() returns

  • zero in this (rare) case, and we handle it by continuing to scan the queue. */ static void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int wake_flags, void *key) { wait_queue_t *curr, *next;

     list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
             unsigned flags = curr->flags;
    
             if (curr->func(curr, mode, wake_flags, key) &&
                             (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
                     break;
     }
    

}

/**

  • __wake_up - wake up threads blocked on a waitqueue.

  • @q: the waitqueue

  • @mode: which threads

  • @nr_exclusive: how many wake-one or wake-many threads to wake up

  • @key: is directly passed to the wakeup function

  • It may be assumed that this function implies a write memory barrier before

  • changing the task state if and only if any tasks are woken up. */ void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, void *key) { unsigned long flags;

     spin_lock_irqsave(&q->lock, flags);
     __wake_up_common(q, mode, nr_exclusive, 0, key);
     spin_unlock_irqrestore(&q->lock, flags);
    

} EXPORT_SYMBOL(__wake_up); 4 TASK_KILLABLE 有人认为使用vfork函数子进程在调用exec或者退出之前,父进程处于 TASK_UNINTERRUPTIBLE 状态,事实并非如此,因为进程可以轻易被Kill命令杀死。 但是此时ps命令显示这个进程确实是D+状态。内核自2.6.25开始,引入了TASK_KILLABLE, 处于TASK_UNINTERRUPTIBLE和TASK_INTERRUPTIBLE之间,进程收到致命信号SIGKILL时 会被唤醒。

5 __TASK_STOPPED和__TASK_TRACED SIGSTOP、SIGTSTP、SIGTTIN、SIGTTOUT等信号会将进程暂时停止,进入__TASK_STOPPED 状态。这4种状态不可被忽略,不可被屏蔽,不能安装新的处理函数。在收到SIGCONT 后进程可以恢复执行。

使用gdb跟踪进程可以进入__TASK_TRACED状态。调试进程下达PTRACE_COUT或者 PTRACE_DETACH等可将其重新执行。

6 EXIT_ZOMBIE 和 EXIT_DEAD 这两种状态下面,进程已经死掉了,只是TASK_ZOMBIE状态中的进程没有被收尸,或者 父进程没有设置SIGCHLD处理函数为SIG_IGN,或者为SIGCHLD设置SA_NOCLDWAIT标志位。

进程的状态可以在/proc//status中看到。对应关系如下。

procfs 进程状态 R(runnng) TASK_RUNNING S(sleeping) TASK_INTERRUPTIBLE D(disk sleeping) TASK_UNINTERRUPTIBLE T(stopped) __TASK_STOPPED t(tracing stop) __TASK_TRACED Z(zombie) EXIT_ZOMBIE X(dead) EXIT_DEAD


https://quant67.com/post/linux/taskstatus.html