哨兵模式

主从切换技术的方法是:当主服务器宕机后,需要手动把一台从服务器切换为主服务器,这就需要人工干预,费事费力,还会造成一段时间内服务不可用。这不是一种推荐的方式,更多时候,我们优先考虑哨兵模式

哨兵模式概述

哨兵模式是一种特殊的模式,首先Redis提供了哨兵的命令,哨兵是一个独立的进程,作为进程,它会独立运行。其原理是哨兵通过发送命令等待Redis服务器响应从而监控运行的多个Redis实例

这里的哨兵有两个作用

  • 通过发送命令,让Redis服务器返回监控其运行状态,包括主服务器和从服务器。
  • 当哨兵监测到master宕机,会自动将slave切换成master,然后通过发布订阅模式通知其他的从服务器,修改配置文件,让它们切换主机。

然而一个哨兵进程对Redis服务器进行监控,可能会出现问题,为此,我们可以使用多个哨兵进行监控。各个哨兵之间还会进行监控,这样就形成了多哨兵模式。

用文字描述一下故障切换(failover)的过程。假设主服务器宕机,哨兵1先检测到这个结果,系统并不会马上进行failover过程,仅仅是哨兵1主观的认为主服务器不可用,这个现象成为主观下线。当后面的哨兵也检测到主服务器不可用,并且数量达到一定值时,那么哨兵之间就会进行一次投票,投票的结果由一个哨兵发起,进行failover操作。切换成功后,就会通过发布订阅模式,让各个哨兵把自己监控的从服务器实现切换主机,这个过程称为客观下线。这样对于客户端而言,一切都是透明的。

Redis配置哨兵模式

配置1个哨兵和1主2从的Redis服务器来演示这个过程。

服务器类型 是否是主服务器 IP地址 端口
Redis 127.0.0.1 6379
Redis 127.0.0.1 6380
Redis 127.0.0.1 6381

测试

测试环境:一主二从、一个哨兵

  • 哨兵配置文件 sentinel.config
1
2
# sentinel monitor myredis 被监控主机名称 host 1        数字 1 代表主机挂了, slave 投票看谁接替成为主机, 就会成为主机
sentinel monitor myredis 127.0.0.1 6379 1
  • 启动哨兵
1
redis-sentinel myconfig/sentinel.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@ecs-t6-medium-2-linux-20190910000110 redis-5.0.7]# redis-sentinel myconfig/sentinel.conf
10933:X 12 Jan 2021 15:55:07.961 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
10933:X 12 Jan 2021 15:55:07.961 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=10933, just started
10933:X 12 Jan 2021 15:55:07.961 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 5.0.7 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 10933
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

10933:X 12 Jan 2021 15:55:07.962 # Sentinel ID is af64af6020684a2fe79f1cc4e1727c39778263bc
10933:X 12 Jan 2021 15:55:07.962 # +monitor master myredis 127.0.0.1 6379 quorum 1
10933:X 12 Jan 2021 16:04:10.045 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ myredis 127.0.0.1 6379
10933:X 12 Jan 2021 16:04:30.090 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ myredis 127.0.0.1 6379

当6379宕机或者关闭后

哨兵日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
11726:X 12 Jan 2021 16:05:35.830 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 5.0.7 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in sentinel mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 26379
| `-._ `._ / _.-' | PID: 11726
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

11726:X 12 Jan 2021 16:05:35.831 # Sentinel ID is af64af6020684a2fe79f1cc4e1727c39778263bc
11726:X 12 Jan 2021 16:05:35.831 # +monitor master myredis 127.0.0.1 6379 quorum 1
11726:X 12 Jan 2021 16:07:34.811 # +sdown master myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:34.811 # +odown master myredis 127.0.0.1 6379 #quorum 1/1
11726:X 12 Jan 2021 16:07:34.811 # +new-epoch 1
11726:X 12 Jan 2021 16:07:34.811 # +try-failover master myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:34.814 # +vote-for-leader af64af6020684a2fe79f1cc4e1727c39778263bc 1
11726:X 12 Jan 2021 16:07:34.814 # +elected-leader master myredis 127.0.0.1 6379
# failove 故障转移
11726:X 12 Jan 2021 16:07:34.814 # +failover-state-select-slave master myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:34.866 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:34.866 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:34.925 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:35.087 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:35.087 # +failover-state-reconf-slaves master myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:35.147 * +slave-reconf-sent slave 127.0.0.1:6380 127.0.0.1 6380 @ myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:36.094 * +slave-reconf-inprog slave 127.0.0.1:6380 127.0.0.1 6380 @ myredis 127.0.0.1 6379
11726:X 12 Jan 2021 16:07:36.094 * +slave-reconf-done slave 127.0.0.1:6380 127.0.0.1 6380 @ myredis 127.0.0.1 6379

11726:X 12 Jan 2021 16:07:36.171 # +failover-end master myredis 127.0.0.1 6379
# 6381
11726:X 12 Jan 2021 16:07:36.171 # +switch-master myredis 127.0.0.1 6379 127.0.0.1 6381
11726:X 12 Jan 2021 16:07:36.171 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ myredis 127.0.0.1 6381
11726:X 12 Jan 2021 16:07:36.171 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ myredis 127.0.0.1 6381

此时6381为老大

1
2
3
4
5
6
7
8
9
10
11
12
13
14
127.0.0.1:6381> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=17768,lag=0
master_replid:25c55d651308dd9fcd2f9c84867ed1a58f2f4286
master_replid2:9d786450a338f30155c623f99d1bd58129712aab
master_repl_offset:17768
second_repl_offset:11402
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:17768
127.0.0.1:6381>

如果主机此时回来了, 只能归并到新的主机下, 当作从机, 这就是哨兵模式的规则

1
2
3
4
5
6
7
11726:X 12 Jan 2021 16:07:36.171 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ myredis 127.0.0.1 6381
11726:X 12 Jan 2021 16:08:06.192 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ myredis 127.0.0.1 6381
11726:X 12 Jan 2021 16:22:59.961 # -sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ myredis 127.0.0.1 6381
11726:X 12 Jan 2021 16:23:34.353 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ myredis 127.0.0.1 6381
11726:X 12 Jan 2021 16:23:37.565 * +reboot slave 127.0.0.1:6379 127.0.0.1 6379 @ myredis 127.0.0.1 6381
11726:X 12 Jan 2021 16:23:37.648 # -sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ myredis 127.0.0.1 6381
11726:X 12 Jan 2021 16:23:47.568 * +convert-to-slave slave 127.0.0.1:6379 127.0.0.1 6379 @ myredis 127.0.0.1 6381

哨兵模式

优点

  • 哨兵集群, 基于主从复制模式, 所有主从配置优点, 它全有
  • 主从可以切换, 故障可以转移, 系统可用性就会很好
  • 哨兵模式就是主从模式的升级, 手动到自动, 更加健壮

缺点

  • redis 不好在线扩容, 集群容量一旦到达上限, 在线扩容就十分的麻烦
  • 实现哨兵模式的配置其实是很麻烦的, 里面有很多选择

哨兵模式全部配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# Example sentinel.conf

# *** IMPORTANT ***
#
# By default Sentinel will not be reachable from interfaces different than
# localhost, either use the 'bind' directive to bind to a list of network
# interfaces, or disable protected mode with "protected-mode no" by
# adding it to this configuration file.
#
# Before doing that MAKE SURE the instance is protected from the outside
# world via firewalling or other means.
#
# For example you may use one of the following:
#
# bind 127.0.0.1 192.168.1.1
#
# protected-mode no

# port <sentinel-port>
# The port that this sentinel instance will run on
port 26379 # 如果有哨兵集群, 我们还需要配置每个哨兵端口

# By default Redis Sentinel does not run as a daemon. Use 'yes' if you need it.
# Note that Redis will write a pid file in /var/run/redis-sentinel.pid when
# daemonized.
daemonize no

# When running daemonized, Redis Sentinel writes a pid file in
# /var/run/redis-sentinel.pid by default. You can specify a custom pid file
# location here.
pidfile /var/run/redis-sentinel.pid

# Specify the log file name. Also the empty string can be used to force
# Sentinel to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /dev/null
logfile ""

# sentinel announce-ip <ip>
# sentinel announce-port <port>
#
# The above two configuration directives are useful in environments where,
# because of NAT, Sentinel is reachable from outside via a non-local address.
#
# When announce-ip is provided, the Sentinel will claim the specified IP address
# in HELLO messages used to gossip its presence, instead of auto-detecting the
# local address as it usually does.
#
# Similarly when announce-port is provided and is valid and non-zero, Sentinel
# will announce the specified TCP port.
#
# The two options don't need to be used together, if only announce-ip is
# provided, the Sentinel will announce the specified IP and the server port
# as specified by the "port" option. If only announce-port is provided, the
# Sentinel will announce the auto-detected local IP and the specified port.
#
# Example:
#
# sentinel announce-ip 1.2.3.4

# dir <working-directory>
# Every long running process should have a well-defined working directory.
# For Redis Sentinel to chdir to /tmp at startup is the simplest thing
# for the process to don't interfere with administrative tasks such as
# unmounting filesystems.
# 哨兵的 sentinel的工作目录
dir /tmp

# sentinel monitor <master-name> <ip> <redis-port> <quorum>
#
# Tells Sentinel to monitor this master, and to consider it in O_DOWN
# (Objectively Down) state only if at least <quorum> sentinels agree.
#
# Note that whatever is the ODOWN quorum, a Sentinel will require to
# be elected by the majority of the known Sentinels in order to
# start a failover, so no failover can be performed in minority.
#
# Replicas are auto-discovered, so you don't need to specify replicas in
# any way. Sentinel itself will rewrite this configuration file adding
# the replicas using additional configuration options.
# Also note that the configuration file is rewritten when a
# replica is promoted to master.
#
# Note: master name should not include special characters or spaces.
# The valid charset is A-z 0-9 and the three characters ".-_".
# 哨兵 sentinel 监控的 redis 主节点 ip port
# master-name 可以自己命名的主节点名字 只能由字母 A-Z 数字0-9 这三个字符".-_" 组成
# quorum 配置多少个 sentinel 哨兵统一认为 master主节点失联, 那么这时候客观上认为主节点失联了
# sentinel monitor <master-name> <ip> <redis-port> <quorum>
sentinel monitor mymaster 127.0.0.1 6379 2

# sentinel auth-pass <master-name> <password>
#
# Set the password to use to authenticate with the master and replicas.
# Useful if there is a password set in the Redis instances to monitor.
#
# Note that the master password is also used for replicas, so it is not
# possible to set a different password in masters and replicas instances
# if you want to be able to monitor these instances with Sentinel.
#
# However you can have Redis instances without the authentication enabled
# mixed with Redis instances requiring the authentication (as long as the
# password set is the same for all the instances requiring the password) as
# the AUTH command will have no effect in Redis instances with authentication
# switched off.
#
# Example:
#
# 当Redia实例中开启了 requirepass foobared 授权密码 这样所有的连接Redis 实例的客户端都要提供密码
# 设置哨兵 sentinel 连接主从密码, 注意必须为主从设置一样的验证密码
# sentinel auth-pass mymaster MySUPER--secret-0123passw0rd

# sentinel down-after-milliseconds <master-name> <milliseconds>
#
# Number of milliseconds the master (or any attached replica or sentinel) should
# be unreachable (as in, not acceptable reply to PING, continuously, for the
# specified period) in order to consider it in S_DOWN state (Subjectively
# Down).
#
# Default is 30 seconds.
# 指定多少毫秒之后, 主节点没有应答哨兵 sentinel 此时 哨兵主管上认为节点下线 默认 30s
# sentinel down-after-milliseconds <master-name> <milliseconds>
sentinel down-after-milliseconds mymaster 30000

# sentinel parallel-syncs <master-name> <numreplicas>
#
# How many replicas we can reconfigure to point to the new replica simultaneously
# during the failover. Use a low number if you use the replicas to serve query
# to avoid that all the replicas will be unreachable at about the same
# time while performing the synchronization with the master.
# 这个配置项指定了发生 failover 主备切换时最多有多少个slave同时对新的master 进行同步
# 这个数字越小 完成 failover 所需时间就越长
# 但是如果这个数字越大, 就意味着越多的 slave 因为 replication而不可用
# 可以通过这个值设置为1 来保证每次只有一个 slave 处于不能处理命令的请求状态
# sentinel parallel-syncs <master-name> <numreplicas>
sentinel parallel-syncs mymaster 1

# sentinel failover-timeout <master-name> <milliseconds>
#
# Specifies the failover timeout in milliseconds. It is used in many ways:
#
# - The time needed to re-start a failover after a previous failover was
# already tried against the same master by a given Sentinel, is two
# times the failover timeout.
#
# - The time needed for a replica replicating to a wrong master according
# to a Sentinel current configuration, to be forced to replicate
# with the right master, is exactly the failover timeout (counting since
# the moment a Sentinel detected the misconfiguration).
#
# - The time needed to cancel a failover that is already in progress but
# did not produced any configuration change (SLAVEOF NO ONE yet not
# acknowledged by the promoted replica).
#
# - The maximum time a failover in progress waits for all the replicas to be
# reconfigured as replicas of the new master. However even after this time
# the replicas will be reconfigured by the Sentinels anyway, but not with
# the exact parallel-syncs progression as specified.
#
# Default is 3 minutes.
#
# 故障转移超时时间 failover-timeout 可以用在以下这些方面
# 1 同一个 sentinel 对同一个master两次 failover 之间的间隔时间
# 2 一个 slave 从一个错误的 master 那里同步数据开始计算时间, 直到 slave 被纠正为正确的 master 那里同步数据时
# 3 当想要取消一个正在进行的 failover 所需要的时间
# 4 当进行 failover 时, 配置所有 slaves 指定新的 master 所需要的最大时间. 不过, 即使过了这个超时, slave 依然会被正确的配置为指向 master 但是就不按 paralle1- syncs 所配置规来了
sentinel failover-timeout mymaster 180000

# SCRIPTS EXECUTION
#
# sentinel notification-script and sentinel reconfig-script are used in order
# to configure scripts that are called to notify the system administrator
# or to reconfigure clients after a failover. The scripts are executed
# with the following rules for error handling:
#
# If script exits with "1" the execution is retried later (up to a maximum
# number of times currently set to 10).
#
# If script exits with "2" (or an higher value) the script execution is
# not retried.
#
# If script terminates because it receives a signal the behavior is the same
# as exit code 1.
#
# A script has a maximum running time of 60 seconds. After this limit is
# reached the script is terminated with a SIGKILL and the execution retried.

# 配置当某一件事情发生时, 所需要执行的脚本, 可以通过脚本通知来通知管理员, 例如当系统不能正常发邮件通知相关人员
# 对于脚本的运行结果有以下规则
# 若脚本执行后返回1, 那么该脚本稍后将会被再次执行, 重复次数默认为 10
# 若脚本执行后返回2, 或者比2更高的一个返回值, 脚本将不会重复执行
# 如果脚本在执行过程中由于收到系统中断信号被终止了, 则同返回值1时的行为相同
# 一个脚本的最大执行时间为60s, 如果超过了这个时间, 脚本将会被一个SIGKILL信号终止, 之后重新执行

# NOTIFICATION SCRIPT
#
# sentinel notification-script <master-name> <script-path>
#
# Call the specified notification script for any sentinel event that is
# generated in the WARNING level (for instance -sdown, -odown, and so forth).
# This script should notify the system administrator via email, SMS, or any
# other messaging system, that there is something wrong with the monitored
# Redis systems.
#
# The script is called with just two arguments: the first is the event type
# and the second the event description.
#
# The script must exist and be executable in order for sentinel to start if
# this option is provided.
#
# Example:
#
# sentinel notification-script mymaster /var/redis/notify.sh

# CLIENTS RECONFIGURATION SCRIPT
#
# sentinel client-reconfig-script <master-name> <script-path>
#
# When the master changed because of a failover a script can be called in
# order to perform application-specific tasks to notify the clients that the
# configuration has changed and the master is at a different address.
#
# The following arguments are passed to the script:
#
# <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>
#
# <state> is currently always "failover"
# <role> is either "leader" or "observer"
#
# The arguments from-ip, from-port, to-ip, to-port are used to communicate
# the old address of the master and the new address of the elected replica
# (now a master).
#
# This script should be resistant to multiple invocations.
#
# Example:
#
# sentinel client-reconfig-script mymaster /var/redis/reconfig.sh

# SECURITY
#
# By default SENTINEL SET will not be able to change the notification-script
# and client-reconfig-script at runtime. This avoids a trivial security issue
# where clients can set the script to anything and trigger a failover in order
# to get the program executed.

sentinel deny-scripts-reconfig yes

# REDIS COMMANDS RENAMING
#
# Sometimes the Redis server has certain commands, that are needed for Sentinel
# to work correctly, renamed to unguessable strings. This is often the case
# of CONFIG and SLAVEOF in the context of providers that provide Redis as
# a service, and don't want the customers to reconfigure the instances outside
# of the administration console.
#
# In such case it is possible to tell Sentinel to use different command names
# instead of the normal ones. For example if the master "mymaster", and the
# associated replicas, have "CONFIG" all renamed to "GUESSME", I could use:
#
# SENTINEL rename-command mymaster CONFIG GUESSME
#
# After such configuration is set, every time Sentinel would use CONFIG it will
# use GUESSME instead. Note that there is no actual need to respect the command
# case, so writing "config guessme" is the same in the example above.
#
# SENTINEL SET can also be used in order to perform this configuration at runtime.
#
# In order to set a command back to its original name (undo the renaming), it
# is possible to just rename a command to itsef:
#
# SENTINEL rename-command mymaster CONFIG CONFIG

文章已上传gitee https://gitee.com/codingce/hexo-blog
项目地址: https://github.com/xzMhehe/codingce-java

评论