Consul leader election issues

Problem: The cluster is in a broken state because consul can't seem to gather a quorum w/ it's raft implementation.

In my case, there was a raft peer that was bogus. I accidentally had it advertising it's IP as 127.0.0.1, but there was no process who had that node-id at that address.

There are two possible paths out that I know of. You can put a peers.json file in the consul data directory. Or you can manually bring up the consul process with the -bootstrap flag, to allow it to self-elect into a leader. The peers.json file approach worked for me.

peers.json format differs depending on raft implementation, but mine looked like this.

[
  {
    "id": "e4c3529a-c3ad-ae8b-7e8a-60c784d72eea",
    "address": "192.168.88.2:8300",
    "non_voter": false
  },
  {
    "id": "0ab95c84-c779-6439-289b-781e74f64503",
    "address": "192.168.88.3:8300",
    "non_voter": false
  },
  {
    "id": "5942fa52-081f-44c8-4ba7-ffc4f14f8807",
    "address": "192.168.88.4:8300",
    "non_voter": false
  }
]

To have the system re-bootstrap, stop the consul process (sudo systemctl stop consul) on all nodes in the quorum. Put the peers.json file in $CONSUL_DATA_DIR/raft/ (the consul data directory is specified in the consul config) for each node. Start the processes again.

Example logs from failing to elect a leader

Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.326Z [ERROR] agent: failed to sync changes: error="No cluster leader"
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.523Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.524Z [INFO]  agent.server.raft: entering candidate state: node="Node at 192.168.88.3:8300 [Candidate]" term=31
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.532Z [INFO]  agent.server.raft: election won: term=31 tally=2
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.532Z [INFO]  agent.server.raft: entering leader state: leader="Node at 192.168.88.3:8300 [Leader]"
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.532Z [INFO]  agent.server.raft: added peer, starting replication: peer=e4c3529a-c3ad-ae8b-7e8a-60c784d72eea
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.532Z [INFO]  agent.server.raft: added peer, starting replication: peer=5942fa52-081f-44c8-4ba7-ffc4f14f8807
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.532Z [INFO]  agent.server.raft: added peer, starting replication: peer=6826fedb-99ea-196e-bbb8-bf57ad0989fe
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.533Z [INFO]  agent.server: cluster leadership acquired
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.533Z [INFO]  agent.server: New leader elected: payload=abrahms-server-1
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.533Z [WARN]  agent.server.raft: unable to get address for server, using fallback address: id=6826fedb-99ea-196e-bbb8-bf57ad0989fe fallback=127.0.0.1:8300 error="Could not find address for server >
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.533Z [INFO]  agent.server.raft: pipelining replication: peer="{Voter e4c3529a-c3ad-ae8b-7e8a-60c784d72eea 192.168.88.2:8300}"
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.536Z [INFO]  agent.server.raft: pipelining replication: peer="{Voter 5942fa52-081f-44c8-4ba7-ffc4f14f8807 192.168.88.4:8300}"
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.540Z [INFO]  agent.server.raft: entering follower state: follower="Node at 192.168.88.3:8300 [Follower]" leader-address= leader-id=
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.540Z [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter e4c3529a-c3ad-ae8b-7e8a-60c784d72eea 192.168.88.2:8300}"
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.540Z [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter 5942fa52-081f-44c8-4ba7-ffc4f14f8807 192.168.88.4:8300}"
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.540Z [ERROR] agent.server: failed to wait for barrier: error="leadership lost while committing log"
Jul 20 05:29:38 abrahms-server-1 consul[1323]: 2023-07-20T05:29:38.540Z [INFO]  agent.server: cluster leadership lost
Jul 20 05:29:42 abrahms-server-1 consul[1323]: 2023-07-20T05:29:42.426Z [INFO]  agent.server.serf.wan: serf: attempting reconnect to abrahms-server-9rzl.dc1 127.0.0.1:8302
Jul 20 05:29:45 abrahms-server-1 consul[1323]: 2023-07-20T05:29:45.680Z [WARN]  agent: Syncing service failed.: service=consul error="No cluster leader"
Jul 20 05:29:45 abrahms-server-1 consul[1323]: 2023-07-20T05:29:45.680Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.236Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.236Z [INFO]  agent.server.raft: entering candidate state: node="Node at 192.168.88.3:8300 [Candidate]" term=32
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.244Z [INFO]  agent.server.raft: election won: term=32 tally=2
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.245Z [INFO]  agent.server.raft: entering leader state: leader="Node at 192.168.88.3:8300 [Leader]"
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.245Z [INFO]  agent.server.raft: added peer, starting replication: peer=e4c3529a-c3ad-ae8b-7e8a-60c784d72eea
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.245Z [INFO]  agent.server: cluster leadership acquired
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.245Z [INFO]  agent.server.raft: added peer, starting replication: peer=5942fa52-081f-44c8-4ba7-ffc4f14f8807
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.245Z [INFO]  agent.server.raft: added peer, starting replication: peer=6826fedb-99ea-196e-bbb8-bf57ad0989fe
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.245Z [WARN]  agent.server.raft: unable to get address for server, using fallback address: id=6826fedb-99ea-196e-bbb8-bf57ad0989fe fallback=127.0.0.1:8300 error="Could not find address for server >
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.245Z [INFO]  agent.server: New leader elected: payload=abrahms-server-1
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.246Z [INFO]  agent.server.raft: pipelining replication: peer="{Voter e4c3529a-c3ad-ae8b-7e8a-60c784d72eea 192.168.88.2:8300}"
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.248Z [INFO]  agent.server.raft: pipelining replication: peer="{Voter 5942fa52-081f-44c8-4ba7-ffc4f14f8807 192.168.88.4:8300}"
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.248Z [INFO]  agent.server.raft: entering follower state: follower="Node at 192.168.88.3:8300 [Follower]" leader-address= leader-id=
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.248Z [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter e4c3529a-c3ad-ae8b-7e8a-60c784d72eea 192.168.88.2:8300}"
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.248Z [ERROR] agent.server: failed to wait for barrier: error="node is not the leader"
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.249Z [INFO]  agent.server: cluster leadership lost
Jul 20 05:29:47 abrahms-server-1 consul[1323]: 2023-07-20T05:29:47.248Z [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter 5942fa52-081f-44c8-4ba7-ffc4f14f8807 192.168.88.4:8300}"
© 2012 - 2023 · Home — Theme Simpleness