linux中的SysRq魔术键

接触SysRq完全是一种巧合。由于平时手比较欠,总是冷不丁地就将ubuntu给整死机了,这要是在以前我大概是选择按下电源键重启了。但这样做的危害也是显而易见的,轻则数据丢失,重则系统直接挂掉重启一片黑。于是乎,我给自己告诫再三:死机切记不要暴力重启

linux应对死机三步骤

经过多方查阅与实践总结,我大致摸索出了如下应对系统死机的解决方案,流程如下:

  1. 如果死机是由xwindow等窗口程序引起的,如因为gnome导致的假死机,这时候可以按下alt+F2调出gnome运行窗口,接着输入r回车,来刷新gnome
  2. 如果第一步不奏效,可能是xwindow已经挂掉了,这时候可以选择进入tty终端。按下快捷键组合ctrl+alt+F3就进入了tty3(类似的还可以进入tty4tty5等)。这是一个类似shell的界面,在这里我们可以先通过top命令获取高cpu占用进程,再通过pkill 进程名或者kill -9 pid的方式杀死死锁进程。最后通过ctrl+alt+F2返回xwindow界面。
  3. 如果以上都无效,那大概率是系统定底层出现了问题,这时候就要祭出SysRq魔术键了。

什么是SysRq

SysRq 经常被称为 Magic System Request,它被定义为一系列按键组合。当系统因为某种原因已经停止对大部分正常服务的响应,但是系统仍然可以响应键盘的按键中断请求。在这种情况下,SysRq 的按键组合将发挥它的神奇作用。

通过它,不但可以在保证磁盘数据安全的情况下重启一台挂起的服务器,避免数据丢失和重启后长时间的文件系统检查,还可以收集包括系统内存使用,CPU 任务处理,进程运行状态等系统运行信息,甚至还可能在无需重启的情况下挽回一台已经停止响应的服务器。

启动SysRq

首先检查SysRq是否开启

1
cat /proc/sys/kernel/sysrq

若输出为0,则还未开启。可以通过systcl命令开启SysRq,命令如下:

1
sudo sysctl -w kernel.sysrq=1

由于以上操作只在本次开机运行时有效,为保证下次开机SysRq服务自动启用,需进行如下配置:

编辑/etc/sysctl.conf,添加如下一行内容(或去掉其前注释)

1
kernel.sysrq = 1

常用SysRq组合键

R-E-I-S-U-B – 安全重启万精油

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
R - 把键盘设置为 ASCII 模式 (用于接收后面键盘输入)
SysRq: Keyboard mode set to XLATE

E - 向除 init 以外所有进程发送 SIGTERM 信号 (让进程自己正常退出)
SysRq: Terminate All Tasks

I - 向除 init 以外所有进程发送 SIGKILL 信号 (强制结束进程)
SysRq: Kill All Tasks

S - 磁盘缓冲区同步
SysRq : Emergency Sync

U - 重新挂载为只读模式
SysRq : Emergency Remount R/O

B - 立即重启系统
SysRq: Resetting

由于系统环境与后台进程个数的不确定性,每一步按键操作执行完成所费时间无法确定。为保险起见,一般采用R – 1 秒 – E – 30 秒 – I – 10 秒 – S – 5 秒 – U – 5 秒 – B,而不是一气呵成地按下这六个键

E-I-K – 解决系统假死利器

有时候系统的死机仅仅是因为个别进程过分消耗cpu或内存等系统资源所引发的,这时候就没有必要非得重启来解决问题。我们需要做的就是找出“幕后黑手”,结束掉该进程就行了。

1
2
3
4
5
6
7
8
9
10
11
12
E - 向除 init 以外所有进程发送 SIGTERM 信号 (让进程自己正常退出)
SysRq: Terminate All Tasks

I - 向除 init 以外所有进程发送 SIGKILL 信号 (强制结束进程)
SysRq: Kill All Tasks

K - 结束与当前控制台相关的全部进程
SysRq : SAK

F - 人为触发 OOM Killer (可选,除非可以确认是内存使用问题,尽量避免使用这个组合键)
SysRq : Manual OOM execution
(OOM Killer 将根据各进程的内存处理情况选取最合适的“凶手”进程,并向其发送 SIGKILL 信 号,中 止其运行。)

M-P-T-W – 系统死机证据收集机

SysRq 提供了 M-P-T-W 序列,在恢复系统挂起之前,这是一个推荐执行的序列。它会记录下当前系统的内存使用情况,当前 CPU 寄存器的状态,进程运行状态,以及所有 CPU 及寄存器的状态。通过这些信息,可以对挂起的原因做粗略的分析。

1
2
3
4
5
6
7
8
9
10
11
M - 打印内存使用信息
SysRq : Show Memory

P - 打印当前 CPU 寄存器信息
SysRq : Show Regs

T - 打印进程列表
SysRq : Show State

W - 打印 CPU 信息
SysRq : Show CPUs

其它功能键组合

1
2
3
4
5
6
7
8
H - 帮助
它显示了当前系统支持的所有 SysRq 组合,所有的按键均用大写字母表示。

C - 触发 Crashdump
更详细系统挂起的诊断和数据收集

N - 降低实时任务运行优化级
这对于由实时任务消耗 CPU 引起的系统挂起会起到立竿见影的作用。

查看SysRq输出

  1. 输出到本地终端

    SysRq 默认会根据console_loglevel输出到本地终端。只要 console_loglevel 大于 default_message_loglevelSysRq信息就会输出到本地控制台终端。

  2. 输出到 syslog

    根据 syslog 的默认配置,SysRq默认会记录到 /var/log/messages,并且这里记录的信息与
    console_loglevel 无关,基本是完整的。但是由于负责记录日志的 syslogd 本身也是一个用户进程,在执行后面即将介绍的 SysRq-E, SysRq-I 时也会被终结,这就意味着 syslog 记录的信息在一定情况下将不再完整。

  3. 通过 netconsole 输出

  4. 输出到串口终端

附录 – SysRq.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
Linux Magic System Request Key Hacks
Documentation for sysrq.c version 1.15
Last update: $Date: 2001/01/28 10:15:59 $
* What is the magic SysRq key?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is a 'magical' key combo you can hit which the kernel will respond to
regardless of whatever else it is doing, unless it is completely locked up.
* How do I enable the magic SysRq key?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You need to say "yes" to 'Magic SysRq key (CONFIG_MAGIC_SYSRQ)' when
configuring the kernel. When running a kernel with SysRq compiled in,
/proc/sys/kernel/sysrq controls the functions allowed to be invoked via
the SysRq key. By default the file contains 1 which means that every
possible SysRq request is allowed (in older versions SysRq was disabled
by default, and you were required to specifically enable it at run-time
but this is not the case any more). Here is the list of possible values
in /proc/sys/kernel/sysrq:
0 - disable sysrq completely
1 - enable all functions of sysrq
>1 - bitmask of allowed sysrq functions (see below for detailed function
description):
2 - enable control of console logging level
4 - enable control of keyboard (SAK, unraw)
8 - enable debugging dumps of processes etc.
16 - enable sync command
32 - enable remount read-only
64 - enable signalling of processes (term, kill, oom-kill)
128 - allow reboot/poweroff
256 - allow nicing of all RT tasks
You can set the value in the file by the following command:
echo "number" >/proc/sys/kernel/sysrq
Note that the value of /proc/sys/kernel/sysrq influences only the invocation
via a keyboard. Invocation of any operation via /proc/sysrq-trigger is always
allowed.
* How do I use the magic SysRq key?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On x86 - You press the key combo 'ALT-SysRq-<command key>'. Note - Some
keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is
also known as the 'Print Screen' key. Also some keyboards cannot
handle so many keys being pressed at the same time, so you might
have better luck with "press Alt", "press SysRq", "release Alt",
"press <command key>", release everything.
On SPARC - You press 'ALT-STOP-<command key>', I believe.
On the serial console (PC style standard serial ports only) -
You send a BREAK, then within 5 seconds a command key. Sending
BREAK twice is interpreted as a normal BREAK.
On PowerPC - Press 'ALT - Print Screen (or F13) - <command key>,
Print Screen (or F13) - <command key> may suffice.
On other - If you know of the key combos for other architectures, please
let me know so I can add them to this section.
On all - write a character to /proc/sysrq-trigger. eg:
echo t > /proc/sysrq-trigger
* What are the 'command' keys?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'r' - Turns off keyboard raw mode and sets it to XLATE.
'k' - Secure Access Key (SAK) Kills all programs on the current virtual
console. NOTE: See important comments below in SAK section.
'b' - Will immediately reboot the system without syncing or unmounting
your disks.
'c' - Will perform a kexec reboot in order to take a crashdump.
'o' - Will shut your system off (if configured and supported).
's' - Will attempt to sync all mounted filesystems.
'u' - Will attempt to remount all mounted filesystems read-only.
'p' - Will dump the current registers and flags to your console.
't' - Will dump a list of current tasks and their information to your
console.
'm' - Will dump current memory info to your console.
'v' - Dumps Voyager SMP processor info to your console.
'0'-'9' - Sets the console log level, controlling which kernel messages
will be printed to your console. ('0', for example would make
it so that only emergency messages like PANICs or OOPSes would
make it to your console.)
'f' - Will call oom_kill to kill a memory hog process
'e' - Send a SIGTERM to all processes, except for init.
'i' - Send a SIGKILL to all processes, except for init.
'l' - Send a SIGKILL to all processes, INCLUDING init. (Your system
will be non-functional after this.)
'h' - Will display help ( actually any other key than those listed
above will display help. but 'h' is easy to remember :-)
* Okay, so what can I use them for?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Well, un'R'aw is very handy when your X server or a svgalib program crashes.
sa'K' (Secure Access Key) is useful when you want to be sure there are no
trojan program is running at console and which could grab your password
when you would try to login. It will kill all programs on given console
and thus letting you make sure that the login prompt you see is actually
the one from init, not some trojan program.
IMPORTANT:In its true form it is not a true SAK like the one in :IMPORTANT
IMPORTANT:c2 compliant systems, and it should be mistook as such. :IMPORTANT
It seems other find it useful as (System Attention Key) which is
useful when you want to exit a program that will not let you switch consoles.
(For example, X or a svgalib program.)
re'B'oot is good when you're unable to shut down. But you should also 'S'ync
and 'U'mount first.
'C'rashdump can be used to manually trigger a crashdump when the system is hung.
The kernel needs to have been built with CONFIG_KEXEC enabled.
'S'ync is great when your system is locked up, it allows you to sync your
disks and will certainly lessen the chance of data loss and fscking. Note
that the sync hasn't taken place until you see the "OK" and "Done" appear
on the screen. (If the kernel is really in strife, you may not ever get the
OK or Done message...)
'U'mount is basically useful in the same ways as 'S'ync. I generally 'S'ync,
'U'mount, then re'B'oot when my system locks. It's saved me many a fsck.
Again, the unmount (remount read-only) hasn't taken place until you see the
"OK" and "Done" message appear on the screen.
The loglevel'0'-'9' is useful when your console is being flooded with
kernel messages you do not want to see. Setting '0' will prevent all but
the most urgent kernel messages from reaching your console. (They will
still be logged if syslogd/klogd are alive, though.)
t'E'rm and k'I'll are useful if you have some sort of runaway process you
are unable to kill any other way, especially if it's spawning other
processes.
* Sometimes SysRq seems to get 'stuck' after using it, what can I do?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
That happens to me, also. I've found that tapping shift, alt, and control
on both sides of the keyboard, and hitting an invalid sysrq sequence again
will fix the problem. (ie, something like alt-sysrq-z). Switching to another
virtual console (ALT+Fn) and then back again should also help.
* I hit SysRq, but nothing seems to happen, what's wrong?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are some keyboards that send different scancodes for SysRq than the
pre-defined 0x54. So if SysRq doesn't work out of the box for a certain
keyboard, run 'showkey -s' to find out the proper scancode sequence. Then
use 'setkeycodes <sequence> 84' to define this sequence to the usual SysRq
code (84 is decimal for 0x54). It's probably best to put this command in a
boot script. Oh, and by the way, you exit 'showkey' by not typing anything
for ten seconds.
* I want to add SysRQ key events to a module, how does it work?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to register a basic function with the table, you must first include
the header 'include/linux/sysrq.h', this will define everything else you need.
Next, you must create a sysrq_key_op struct, and populate it with A) the key
handler function you will use, B) a help_msg string, that will print when SysRQ
prints help, and C) an action_msg string, that will print right before your
handler is called. Your handler must conform to the protoype in 'sysrq.h'.
After the sysrq_key_op is created, you can call the macro
register_sysrq_key(int key, struct sysrq_key_op *op_p) that is defined in
sysrq.h, this will register the operation pointed to by 'op_p' at table
key 'key', if that slot in the table is blank. At module unload time, you must
call the macro unregister_sysrq_key(int key, struct sysrq_key_op *op_p), which
will remove the key op pointed to by 'op_p' from the key 'key', if and only if
it is currently registered in that slot. This is in case the slot has been
overwritten since you registered it.
The Magic SysRQ system works by registering key operations against a key op
lookup table, which is defined in 'drivers/char/sysrq.c'. This key table has
a number of operations registered into it at compile time, but is mutable,
and 4 functions are exported for interface to it: __sysrq_lock_table,
__sysrq_unlock_table, __sysrq_get_key_op, and __sysrq_put_key_op. The
functions __sysrq_swap_key_ops and __sysrq_swap_key_ops_nolock are defined
in the header itself, and the REGISTER and UNREGISTER macros are built from
these. More complex (and dangerous!) manipulations of the table are possible
using these functions, but you must be careful to always lock the table before
you read or write from it, and to unlock it again when you are done. (And of
course, to never ever leave an invalid pointer in the table). Null pointers in
the table are always safe :)
If for some reason you feel the need to call the handle_sysrq function from
within a function called by handle_sysrq, you must be aware that you are in
a lock (you are also in an interrupt handler, which means don't sleep!), so
you must call __handle_sysrq_nolock instead.
* I have more questions, who can I ask?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You may feel free to send email to myrdraal@deathsdoor.com, and I will
respond as soon as possible.
-Myrdraal
And I'll answer any questions about the registration system you got, also
responding as soon as possible.
-Crutcher
* Credits
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Written by Mydraal <myrdraal@deathsdoor.com>
Updated by Adam Sulmicki <adam@cfar.umd.edu>
Updated by Jeremy M. Dolan <jmd@turbogeek.org> 2001/01/28 10:15:59
Added to by Crutcher Dunnavant <crutcher+kernel@datastacks.com>