Linux 搭建 Redis 高可用集群

前言

软件环境

软件 版本
CentOS 7.9
Redis 6.0.6

集群节点规划

Redis 集群至少一共需要 6 个节点,包括 3 个 Master 节点和 3 个 Slave 节点,且每个 Master 节点对应 1 个 Slave 节点,对应的关系如下:

  • 1 Master –> 1 Slave,Redis 集群需要 6 个节点,如图所示
  • 1 Master –> 2 Slave,Redis 集群需要 9 个节点,以此类推,如图所示
名称 IP 端口
Master 192.168.109 7001
Master 192.168.109 7002
Master 192.168.109 7003
Slave 192.168.109 7004
Slave 192.168.109 7005
Slave 192.168.109 7006

Redis 集群特性

Redis 集群的优点

无中心架构,分布式提供服务。数据按照 slot 存储分布在多个 Redis 实例上。增加 Slave 做 Standby 数据副本,用于 Failover,使集群快速恢复。实现故障 Auto Failover,节点之间通过 gossip 协议交换状态信息;投票机制完成 Slave 到 Master 角色的提升。支持在线增加或减少节点,降低硬件成本和运维成本,提高系统的扩展性和可用性。

Redis 集群的缺点

客户端实现复杂,驱动要求实现 Smart Client,缓存 Slots Mapping 信息并及时更新。目前仅 JedisCluster 相对成熟,异常处理部分还不完善。客户端的不成熟,影响应用的稳定性,提高开发难度。节点会因为某些原因发生阻塞(阻塞时间大于 clutser-node-timeout),被判断为下线。这种 Failover 是没有必要的,Sentinel 模式也存在这种切换场景。

Redis 集群搭建

系统初始化

1
2
3
4
5
6
7
8
# 添加配置一
# echo "net.core.somaxconn = 1024" >> /etc/sysctl.conf
# echo "vm.overcommit_memory = 1" >> /etc/sysctl.conf
# sysctl -p

# 添加配置二
# echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.local
# source /etc/rc.local

创建 Redis 用户

1
2
3
4
5
# 创建redis用户组
# groupadd redis

# 创建redis用户(不允许远程登录)
# useradd -g redis redis -s /bin/false

Redis 编译安装

Redis 各版本可以从官网下载,这里使用的版本是 6.0.6

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# 安装依赖
# yum install -y centos-release-scl devtoolset-9 scl-utils-build tcl

# 临时启用GCC9编译环境
# scl enable devtoolset-9 bash

# 下载文件
# wget http://download.redis.io/releases/redis-6.0.6.tar.gz

# 解压文件
# tar -xvf redis-6.0.6.tar.gz

# 进入解压目录
# cd redis-6.0.6

# 编译
# make

# 安装
# make install PREFIX=/usr/local/redis

# 创建软连接
# ln -s /usr/local/redis/bin/redis-benchmark /usr/local/bin/redis-benchmark
# ln -s /usr/local/redis/bin/redis-check-aof /usr/local/bin/redis-check-aof
# ln -s /usr/local/redis/bin/redis-check-rdb /usr/local/bin/redis-check-rdb
# ln -s /usr/local/redis/bin/redis-sentinel /usr/local/bin/redis-sentinel
# ln -s /usr/local/redis/bin/redis-server /usr/local/bin/redis-server
# ln -s /usr/local/redis/bin/redis-cli /usr/local/bin/redis-cli

# 拷贝配置文件
# cp redis.conf /usr/local/redis

# 创建日志目录
# mkdir -p /var/log/redis

# 文件授权
# chown -R redis:redis /var/log/redis
# chown -R redis:redis /usr/local/redis

更改 Redis 的基础配置内容,其中有些配置文件的文件名都包含了端口号,是为了后面方便使用不同的端口号来区分各个节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 更改基础配置
# vim /usr/local/redis/redis.conf

io-threads 2
daemonize yes
# bind 127.0.0.1
protected-mode no
masterauth 123456
requirepass 123456
dbfilename dump_6379.rdb
pidfile /var/run/redis_6379.pid
cluster-config-file nodes_6379.conf
appendfilename "appendonly_6379.aof"
logfile "/var/log/redis/redis_6379.log"

验证 Redis 是否安装成功

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 切换Redis用户
# su redis

# 进入安装目录
$ cd /usr/local/redis

# 启动Redis
$ ./bin/redis-server redis.conf

# 查看Redis的运行状态
$ ps -aux|grep redis

# 关闭Redis
$ ./bin/redis-cli
127.0.0.1:6379> auth 123456
127.0.0.1:6379> shutdown

Redis 搭建集群

创建 Redis 集群各节点的安装文件,并更改与端口相关的所有配置内容(例如:port、pidfile、dbfilename、logfile、cluster-config-file),同时开启对集群的支持

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 创建集群目录
# mkdir -p /usr/local/redis-cluster

# 拷贝各节点的安装文件
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7001
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7002
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7003
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7004
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7005
# cp -r /usr/local/redis /usr/local/redis-cluster/redis-7006

# 更改各节点里与端口相关的所有配置项
# sed -i "s/6379/7001/g" /usr/local/redis-cluster/redis-7001/redis.conf
# sed -i "s/6379/7002/g" /usr/local/redis-cluster/redis-7002/redis.conf
# sed -i "s/6379/7003/g" /usr/local/redis-cluster/redis-7003/redis.conf
# sed -i "s/6379/7004/g" /usr/local/redis-cluster/redis-7004/redis.conf
# sed -i "s/6379/7005/g" /usr/local/redis-cluster/redis-7005/redis.conf
# sed -i "s/6379/7006/g" /usr/local/redis-cluster/redis-7006/redis.conf

# 开启各节点对集群的支持
# sed -i "s/# cluster-enabled/cluster-enabled/g" `find /usr/local/redis-cluster -type f -name "redis.conf"`
# sed -i "s/# cluster-config-file/cluster-config-file/g" `find /usr/local/redis-cluster -type f -name "redis.conf"`
# sed -i "s/# cluster-node-timeout/cluster-node-timeout/g" `find /usr/local/redis-cluster -type f -name "redis.conf"`

# 文件授权
# chown -R redis:redis /usr/local/redis-cluster

拷贝 Redis 的集群管理工具

1
2
3
4
5
6
7
8
# 进入Redis的解压目录
# cd redis-6.0.6

# 拷贝集群管理工具
# cp src/redis-trib.rb /usr/local/redis-cluster

# 文件授权
# chown -R redis:redis /usr/local/redis-cluster/redis-trib.rb

创建 Shell 脚本批量启动 Redis 集群的各个节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# vim /usr/local/redis-cluster/start-cluster.sh

#!/bin/bash
REDIS_CLUSTER_HOME=/usr/local/redis-cluster
cd $REDIS_CLUSTER_HOME
cd redis-7001
./bin/redis-server redis.conf
cd ..
cd redis-7002
./bin/redis-server redis.conf
cd ..
cd redis-7003
./bin/redis-server redis.conf
cd ..
cd redis-7004
./bin/redis-server redis.conf
cd ..
cd redis-7005
./bin/redis-server redis.conf
cd ..
cd redis-7006
./bin/redis-server redis.conf

Shell 脚本授权执行

1
2
3
# 文件授权
# chmod +x /usr/local/redis-cluster/start-cluster.sh
# chown -R redis:redis /usr/local/redis-cluster/start-cluster.sh

Redis 集群设置密码

若需要对集群各节点设置密码,那么 requirepassmasterauth 都需要同时设置,且两者的密码必须一致,否则发生主从切换时,就会遇到授权问题。值得一提的是,在使用 redis-trib.rb 或者 redis-cli 构建集群的时候,两者设置密码的方式是不一样的,具体如下:

  • redis-trib.rb:如果是使用 redis-trib.rb 工具构建集群,集群构建完成前不要配置密码,集群构建完毕需要执行以下命令逐个节点机器设置密码,不需要重启节点
1
2
3
4
$ redis-cli -c -p 7001
config set masterauth 123456
config set requirepass 123456
config rewrite
  • redis-cli:如果是使用 redis-cli 构建集群,首先需要在集群各节点的 redis.conf 中配置密码,包括 requirepassmasterauth,然后在构建集群的命令行里加入 -a password 参数,其中的 password 就是集群各节点的密码
1
2
masterauth 123456
requirepass 123456
1
2
3
4
5
6
7
8
$ redis-cli -a 123456 --cluster create \
192.168.109:7001 \
192.168.109:7002 \
192.168.109:7003 \
192.168.109:7004 \
192.168.109:7005 \
192.168.109:7006 \
--cluster-replicas 1

Redis 集群构建启动

首先执行 Shell 脚本批量启动所有 Redis 节点,切记不能以 Root 用户的身份启动 Redis,否则会造成系统重大安全隐患

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 切换到Redis用户
# su redis

# 启动集群节点
$ ./usr/local/redis-cluster/start-cluster.sh

# 查看各节点的运行状态
$ ps -aux|grep redis
redis 32641 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7001 [cluster]
redis 32649 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7002 [cluster]
redis 32657 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7003 [cluster]
redis 20814 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7004 [cluster]
redis 20822 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7005 [cluster]
redis 20830 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7006 [cluster]

使用 redis-trib.rb 工具构建集群时,在 6.0.6 里面会给打印提示,让你使用 redis-cli 命令来构建集群,并提供给你需要使用的命令,使其和 redis-trib.rb 达到一致的效果(这样就可以不用再单独的安装 Ruby),原本使用 redis-trib.rb 的语句如下

1
2
3
4
5
6
7
$ ./redis-trib.rb create --replicas 1 \
192.168.109:7001 \
192.168.109:7002 \
192.168.109:7003 \
192.168.109:7004 \
192.168.109:7005 \
192.168.109:7006

提供使用的 redis-cli 的语句如下,建议使用 redis-cli 命令来构建 Redis 集群,因为这样就不需要额外安装 Ruby

1
2
3
4
5
6
7
8
$ redis-cli -a 123456 --cluster create \
192.168.109:7001 \
192.168.109:7002 \
192.168.109:7003 \
192.168.109:7004 \
192.168.109:7005 \
192.168.109:7006 \
--cluster-replicas 1

可以看出两个语句都差不多,而且语句意思也差不多,--cluster-replicas 1 表示主备的比例关系为 1,即一个主节点对应一个备节点,前三个 ip:port 默认表示主节点,后面的依次为前三个主节点的备节点。在生产环境使用多台服务器搭建 Redis 集群时,为了保证高可用(在任意一台服务器挂了的情况下都不影响 Redis 集群的使用),主备节点不可以部署在同一台服务器上,因为主备节点在同一台服务器上,则备节点也没有太大的意义了,所以要错开对应。当主节点宕机后,备节点可以充当主节点继续工作,使 Redis 集群正常运行。


执行完构建集群的命令后(只需执行一次),Redis 默认罗列出集群的对应关系来让你确定,输入 yes 完成集群创建即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 192.168.1.109:7006 to 192.168.1.109:7001
Adding replica 192.168.1.109:7003 to 192.168.1.109:7004
Adding replica 192.168.1.109:7005 to 192.168.1.109:7002
M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001
slots:[0-5460] (5461 slots) master
M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002
slots:[10923-16383] (5461 slots) master
M: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004
slots:[5461-10922] (5462 slots) master
S: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003
replicates 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4
S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005
replicates 283abb498445ffd6206f24c451ac0b9fb7129383
S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006
replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5
Can I set the above configuration? (type 'yes' to accept):
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
.
>>> Performing Cluster Check (using node 192.168.1.109:7001)
M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006
slots: (0 slots) slave
replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5
S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005
slots: (0 slots) slave
replicates 283abb498445ffd6206f24c451ac0b9fb7129383
M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003
slots: (0 slots) slave
replicates 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

测试 Redis 集群

Redis 客户端登录进某个集群节点,登录时需要指定密码,下面可以看到数据放入的哈希槽为 [12182],属于 192.168.1.109:7002 所管控的节点,所以就直接跳转到 192.168.1.109:7002 节点来获取刚才放入的数据

1
2
3
4
5
6
7
8
$ redis-cli -c -p 7001 -a 123456

127.0.0.1:7001> set foo hello
-> Redirected to slot [12182] located at 192.168.1.109:7002
OK
192.168.1.109:7002> get foo
"hello"
192.168.1.109:7002>

查看 Redis 当前集群的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ redis-cli -c -p 7001 -a 123456

127.0.0.1:7001> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:7
cluster_my_epoch:1
cluster_stats_messages_ping_sent:3154
cluster_stats_messages_pong_sent:3377
cluster_stats_messages_fail_sent:4
cluster_stats_messages_auth-ack_sent:1
cluster_stats_messages_sent:6536
cluster_stats_messages_ping_received:3372
cluster_stats_messages_pong_received:3154
cluster_stats_messages_meet_received:5
cluster_stats_messages_auth-req_received:1
cluster_stats_messages_received:6532

查看 Redis 特定节点的状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ redis-cli --cluster check 192.168.1.109:7003 -a 123456

Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.1.109:7003 (cde86683...) -> 0 keys | 5462 slots | 1 slaves.
192.168.1.109:7002 (283abb49...) -> 1 keys | 5461 slots | 1 slaves.
192.168.1.109:7001 (225e37e5...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 1 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.1.109:7003)
M: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005
slots: (0 slots) slave
replicates 283abb498445ffd6206f24c451ac0b9fb7129383
S: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004
slots: (0 slots) slave
replicates cde86683e2d314fd52cf8708f78935c6648ea3c6
M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006
slots: (0 slots) slave
replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5
M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001
slots:[0-5460] (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

查看 Redis 所有集群节点的信息

1
2
3
4
5
6
7
8
9
$ redis-cli -c -p 7001 -a 123456

127.0.0.1:7001> cluster nodes
7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004@17004 master - 0 1616460018217 3 connected 5461-10922
225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001@17001 myself,master - 0 1616460015000 1 connected 0-5460
f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006@17006 slave 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 0 1616460018000 1 connected
1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005@17005 slave 283abb498445ffd6206f24c451ac0b9fb7129383 0 1616460016000 2 connected
283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002@17002 master - 0 1616460016000 2 connected 10923-16383
cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003@17003 slave 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 0 1616460017000 3 connected

验证主从切换,从上面的集群信息可以观察到 192.168.1.109:7003 节点是 192.168.1.109:7004 的 Slave 节点,因此可以 Kill 掉 192.168.1.109:7004 Master 节点的进程,然后观察 192.168.1.109:7003 节点会不会选举为新的 Master 节点,若可以则说明主从切换成功,此时 192.168.1.109:7003 节点的日志信息如下:

1
2
3
4
5
6
7
8
9
11970:S 21 Jul 2020 22:48:40.080 * Connecting to MASTER 192.168.1.109:7004
11970:S 21 Jul 2020 22:48:40.080 * MASTER <-> REPLICA sync started
11970:S 21 Jul 2020 22:48:40.081 # Error condition on socket for SYNC: Operation now in progress
11970:S 21 Jul 2020 22:48:40.982 # Starting a failover election for epoch 7.
11970:S 21 Jul 2020 22:48:40.985 # Failover election won: I'm the new master.
11970:S 21 Jul 2020 22:48:40.985 # configEpoch set to 7 after successful failover
11970:M 21 Jul 2020 22:48:40.985 * Discarding previously cached master state.
11970:M 21 Jul 2020 22:48:40.985 # Setting secondary replication ID to 00c7b21f3980b471d3373792d9d61bedf7e424e6, valid up to offset: 2059. New replication ID is c9f299ab0a8124a56d76e0e8a458135893b45336
11970:M 21 Jul 2020 22:48:40.985 # Cluster state changed: ok

最后重新启动 192.168.1.109:7004 节点,可以发现它会变为 192.168.1.109:7003 节点的 Slave 节点

1
2
3
4
5
6
7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004@17004 slave cde86683e2d314fd52cf8708f78935c6648ea3c6 0 1616461490000 7 connected
225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001@17001 myself,master - 0 1616461492000 1 connected 0-5460
f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006@17006 slave 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 0 1616461492000 1 connected
1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005@17005 slave 283abb498445ffd6206f24c451ac0b9fb7129383 0 1616461492010 2 connected
283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002@17002 master - 0 1616461491000 2 connected 10923-16383
cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003@17003 master - 0 1616461493010 7 connected 5461-10922

Redis 集群重建(初始化)

若 Redis 集群出现无法正常使用的问题,可以尝试执行以下操作来重建 Redis 集群来解决,下述操作会删除 Redis 的所有 RDB 快照数据,切记先备份好数据再进行操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# 关闭所有节点服务器上的Redis
$ pkill -9 redis

# 在所有节点服务器上执行以下命令(切记先备份好Redis的快照数据)
$ find /usr/local/redis-cluster -type f -iname "dump*.rdb" | xargs rm -rf
$ find /usr/local/redis-cluster -type f -iname "nodes_*.conf" | xargs rm -rf
$ rm -rf /var/log/redis/*

# 启动所有节点服务器上的Redis
$ ./usr/local/redis-cluster/start-cluster.sh

# 执行集群构建操作
$ redis-cli -a 123456 --cluster create \
192.168.109:7001 \
192.168.109:7002 \
192.168.109:7003 \
192.168.109:7004 \
192.168.109:7005 \
192.168.109:7006 \
--cluster-replicas 1

# 查询集群信息和状态
$ redis-cli -c -p 7001 -a 123456
127.0.0.1:7001> cluster info
127.0.0.1:7001> cluster nodes

参考博客