Warm tip: This article is reproduced from serverfault.com, please click

Kolla OpenStack deployment fails with "haproxy : Waiting for virtual IP to appear"

发布于 2021-01-14 02:39:01

I'm trying to deploy OpenStack Queens with kolla-ansible (7.0.0) on Ubuntu hosts, following the official guide.

After successful bootstrap-servers and precheck the deploy command fails:

RUNNING HANDLER [haproxy : Waiting for virtual IP to appear] **********************************************************  
fatal: [testcloudcontrol01]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 10.52.41.98:3306"}  
fatal: [testcloudcontrol02]: FAILED! => {"changed": false, "elapsed": 300, "msg": "Timeout when waiting for 10.52.41.98:3306"}

The reason for the check to fail is that the kolla_internal_vip_address does not come up.

globals.yml

config_strategy: "COPY_ALWAYS"
kolla_base_distro: "ubuntu"
kolla_install_type: "binary"
openstack_release: "queens"
kolla_internal_vip_address: "10.52.41.98"
kolla_internal_fqdn: "testcloudapi.example.com"
kolla_external_vip_address: "{{ kolla_internal_vip_address }}"
kolla_external_fqdn: "{{ kolla_internal_fqdn }}"
network_interface: "ens160"
api_interface: "ens160"
storage_interface: "ens161"
keepalived_virtual_router_id: "148"

I'm currently fixed on queens because I want to replicate our production environment for testing.

The output of ip addr on one of the nodes where haproxy is supposed to deploy:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:a1:6a:2c brd ff:ff:ff:ff:ff:ff
    inet 10.52.41.100/24 brd 10.52.41.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fea1:6a2c/64 scope link
       valid_lft forever preferred_lft forever
3: ens161: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:a1:7d:07 brd ff:ff:ff:ff:ff:ff
    inet 10.52.42.100/24 brd 10.52.42.255 scope global ens161
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fea1:7d07/64 scope link
       valid_lft forever preferred_lft forever
4: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:a1:23:6e brd ff:ff:ff:ff:ff:ff
    inet 10.52.40.100/24 brd 10.52.40.255 scope global ens224
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fea1:236e/64 scope link
       valid_lft forever preferred_lft forever
5: ens256: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:a1:20:12 brd ff:ff:ff:ff:ff:ff
    inet 10.52.44.100/24 brd 10.52.44.255 scope global ens256
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fea1:2012/64 scope link
       valid_lft forever preferred_lft forever
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:b0:8a:93:e7 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever

The nodes are VMware virtual machines with VMXNet3 nics.

Output of docker logs keepalived:

+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/keepalived/keepalived.conf
INFO:__main__:Copying /var/lib/kolla/config_files/keepalived.conf to /etc/keepalived/keepalived.conf
INFO:__main__:Setting permission for /etc/keepalived/keepalived.conf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD='/usr/sbin/keepalived -nld -p /run/keepalived.pid'
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
++ modprobe ip_vs
++ '[' -f /run/keepalived.pid ']'
+ echo 'Running command: '\''/usr/sbin/keepalived -nld -p /run/keepalived.pid'\'''
Running command: '/usr/sbin/keepalived -nld -p /run/keepalived.pid'
+ exec /usr/sbin/keepalived -nld -p /run/keepalived.pid
Thu Dec 13 12:10:26 2018: Starting Keepalived v1.3.9 (10/21,2017)
Thu Dec 13 12:10:26 2018: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 13 12:10:26 2018: Starting Healthcheck child process, pid=11
Thu Dec 13 12:10:26 2018: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 13 12:10:26 2018: Starting VRRP child process, pid=12
Thu Dec 13 12:10:26 2018: ------< Global definitions >------
Thu Dec 13 12:10:26 2018:  Router ID = testcloudcontrol01.example.com
Thu Dec 13 12:10:26 2018:  Default interface = eth0
Thu Dec 13 12:10:26 2018:  LVS flush = false
Thu Dec 13 12:10:26 2018:  VRRP IPv4 mcast group = 224.0.0.18
Thu Dec 13 12:10:26 2018:  VRRP IPv6 mcast group = ff02::12
Thu Dec 13 12:10:26 2018:  Gratuitous ARP delay = 5
Thu Dec 13 12:10:26 2018:  Gratuitous ARP repeat = 5
Thu Dec 13 12:10:26 2018:  Gratuitous ARP refresh timer = 0
Thu Dec 13 12:10:26 2018:  Gratuitous ARP refresh repeat = 1
Thu Dec 13 12:10:26 2018:  Gratuitous ARP lower priority delay = 4294
Thu Dec 13 12:10:26 2018:  Gratuitous ARP lower priority repeat = -1
Thu Dec 13 12:10:26 2018:  Send advert after receive lower priority advert = true
Thu Dec 13 12:10:26 2018:  Send advert after receive higher priority advert = false
Thu Dec 13 12:10:26 2018:  Gratuitous ARP interval = 0
Thu Dec 13 12:10:26 2018:  Gratuitous NA interval = 0
Thu Dec 13 12:10:26 2018:  VRRP default protocol version = 2
Thu Dec 13 12:10:26 2018:  Iptables input chain = INPUT
Thu Dec 13 12:10:26 2018:  Using ipsets = true
Thu Dec 13 12:10:26 2018:  ipset IPv4 address set = keepalived
Thu Dec 13 12:10:26 2018:  ipset IPv6 address set = keepalived6
Thu Dec 13 12:10:26 2018:  ipset IPv6 address,iface set = keepalived_if6
Thu Dec 13 12:10:26 2018:  VRRP check unicast_src = false
Thu Dec 13 12:10:26 2018:  VRRP skip check advert addresses = false
Thu Dec 13 12:10:26 2018:  VRRP strict mode = false
Thu Dec 13 12:10:26 2018:  VRRP process priority = 0
Thu Dec 13 12:10:26 2018:  VRRP don't swap = false
Thu Dec 13 12:10:26 2018:  Checker process priority = 0
Thu Dec 13 12:10:26 2018:  Checker don't swap = false
Thu Dec 13 12:10:26 2018:  SNMP keepalived disabled
Thu Dec 13 12:10:26 2018:  SNMP checker disabled
Thu Dec 13 12:10:26 2018:  SNMP RFCv2 disabled
Thu Dec 13 12:10:26 2018:  SNMP RFCv3 disabled
Thu Dec 13 12:10:26 2018:  SNMP traps disabled
Thu Dec 13 12:10:26 2018:  SNMP socket = default (unix:/var/agentx/master)
Thu Dec 13 12:10:26 2018:  Network namespace = (default)
Thu Dec 13 12:10:26 2018:  DBus disabled
Thu Dec 13 12:10:26 2018:  DBus service name = (null)
Thu Dec 13 12:10:26 2018:  Script security disabled
Thu Dec 13 12:10:26 2018:  Default script uid:gid 0:0
Thu Dec 13 12:10:26 2018: Registering Kernel netlink reflector
Thu Dec 13 12:10:26 2018: Registering Kernel netlink command channel
Thu Dec 13 12:10:26 2018: Registering gratuitous ARP shared channel
Thu Dec 13 12:10:26 2018: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 13 12:10:26 2018: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
Thu Dec 13 12:10:26 2018: Truncating auth_pass to 8 characters
Thu Dec 13 12:10:26 2018: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Thu Dec 13 12:10:26 2018: ------< Global definitions >------
Thu Dec 13 12:10:26 2018:  Router ID = testcloudcontrol01.example.com
Thu Dec 13 12:10:26 2018:  Default interface = eth0
Thu Dec 13 12:10:26 2018:  LVS flush = false
Thu Dec 13 12:10:26 2018:  VRRP IPv4 mcast group = 224.0.0.18
Thu Dec 13 12:10:26 2018:  VRRP IPv6 mcast group = ff02::12
Thu Dec 13 12:10:26 2018:  Gratuitous ARP delay = 5
Thu Dec 13 12:10:26 2018:  Gratuitous ARP repeat = 5
Thu Dec 13 12:10:26 2018:  Gratuitous ARP refresh timer = 0
Thu Dec 13 12:10:26 2018:  Gratuitous ARP refresh repeat = 1
Thu Dec 13 12:10:26 2018:  Gratuitous ARP lower priority delay = 5
Thu Dec 13 12:10:26 2018:  Gratuitous ARP lower priority repeat = 5
Thu Dec 13 12:10:26 2018:  Send advert after receive lower priority advert = true
Thu Dec 13 12:10:26 2018:  Send advert after receive higher priority advert = false
Thu Dec 13 12:10:26 2018:  Gratuitous ARP interval = 0
Thu Dec 13 12:10:26 2018:  Gratuitous NA interval = 0
Thu Dec 13 12:10:26 2018:  VRRP default protocol version = 2
Thu Dec 13 12:10:26 2018:  Iptables input chain = INPUT
Thu Dec 13 12:10:26 2018:  Using ipsets = false
Thu Dec 13 12:10:26 2018:  ipset IPv4 address set = keepalived
Thu Dec 13 12:10:26 2018:  ipset IPv6 address set = keepalived6
Thu Dec 13 12:10:26 2018:  ipset IPv6 address,iface set = keepalived_if6
Thu Dec 13 12:10:26 2018:  VRRP check unicast_src = false
Thu Dec 13 12:10:26 2018:  VRRP skip check advert addresses = false
Thu Dec 13 12:10:26 2018:  VRRP strict mode = false
Thu Dec 13 12:10:26 2018:  VRRP process priority = 0
Thu Dec 13 12:10:26 2018:  VRRP don't swap = false
Thu Dec 13 12:10:26 2018:  Checker process priority = 0
Thu Dec 13 12:10:26 2018:  Checker don't swap = false
Thu Dec 13 12:10:26 2018:  SNMP keepalived disabled
Thu Dec 13 12:10:26 2018:  SNMP checker disabled
Thu Dec 13 12:10:26 2018:  SNMP RFCv2 disabled
Thu Dec 13 12:10:26 2018:  SNMP RFCv3 disabled
Thu Dec 13 12:10:26 2018:  SNMP traps disabled
Thu Dec 13 12:10:26 2018:  SNMP socket = default (unix:/var/agentx/master)
Thu Dec 13 12:10:26 2018:  Network namespace = (default)
Thu Dec 13 12:10:26 2018:  DBus disabled
Thu Dec 13 12:10:26 2018:  DBus service name = (null)
Thu Dec 13 12:10:26 2018:  Script security disabled
Thu Dec 13 12:10:26 2018:  Default script uid:gid 0:0
Thu Dec 13 12:10:26 2018: ------< VRRP Topology >------
Thu Dec 13 12:10:26 2018:  VRRP Instance = kolla_internal_vip_148
Thu Dec 13 12:10:26 2018:    Using VRRPv2
Thu Dec 13 12:10:26 2018:    Want State = BACKUP
Thu Dec 13 12:10:26 2018:    Running on device = ens160
Thu Dec 13 12:10:26 2018:    Skip checking advert IP addresses = no
Thu Dec 13 12:10:26 2018:    Enforcing strict VRRP compliance = no
Thu Dec 13 12:10:26 2018:    Using src_ip = 10.52.41.100
Thu Dec 13 12:10:26 2018:    Gratuitous ARP delay = 5
Thu Dec 13 12:10:26 2018:    Gratuitous ARP repeat = 5
Thu Dec 13 12:10:26 2018:    Gratuitous ARP refresh timer = 0
Thu Dec 13 12:10:26 2018:    Gratuitous ARP refresh repeat = 1
Thu Dec 13 12:10:26 2018:    Gratuitous ARP lower priority delay = 5
Thu Dec 13 12:10:26 2018:    Gratuitous ARP lower priority repeat = 5
Thu Dec 13 12:10:26 2018:    Send advert after receive lower priority advert = true
Thu Dec 13 12:10:26 2018:    Send advert after receive higher priority advert = false
Thu Dec 13 12:10:26 2018:    Virtual Router ID = 148
Thu Dec 13 12:10:26 2018:    Priority = 1
Thu Dec 13 12:10:26 2018:    Advert interval = 1 sec
Thu Dec 13 12:10:26 2018:    Accept enabled
Thu Dec 13 12:10:26 2018:    Preempt disabled
Thu Dec 13 12:10:26 2018:    Promote_secondaries disabled
Thu Dec 13 12:10:26 2018:    Authentication type = SIMPLE_PASSWORD
Thu Dec 13 12:10:26 2018:    Password = 0RXbQYFF
Thu Dec 13 12:10:26 2018:    Tracked scripts = 1
Thu Dec 13 12:10:26 2018:      check_alive weight 0
Thu Dec 13 12:10:26 2018:    Virtual IP = 1
Thu Dec 13 12:10:26 2018:      10.52.41.98/32 dev ens160 scope global
Thu Dec 13 12:10:26 2018: ------< VRRP Scripts >------
Thu Dec 13 12:10:26 2018:  VRRP Script = check_alive
Thu Dec 13 12:10:26 2018:    Command = /check_alive.sh
Thu Dec 13 12:10:26 2018:    Interval = 2 sec
Thu Dec 13 12:10:26 2018:    Timeout = 0 sec
Thu Dec 13 12:10:26 2018:    Weight = 0
Thu Dec 13 12:10:26 2018:    Rise = 10
Thu Dec 13 12:10:26 2018:    Fall = 2
Thu Dec 13 12:10:26 2018:    Insecure = no
Thu Dec 13 12:10:26 2018:    Status = INIT
Thu Dec 13 12:10:26 2018:    Script uid:gid = 0:0
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018:  Name = lo
Thu Dec 13 12:10:26 2018:  index = 1
Thu Dec 13 12:10:26 2018:  IPv4 address = 127.0.0.1
Thu Dec 13 12:10:26 2018:  IPv6 address = ::
Thu Dec 13 12:10:26 2018:  is UP
Thu Dec 13 12:10:26 2018:  is RUNNING
Thu Dec 13 12:10:26 2018:  MTU = 65536
Thu Dec 13 12:10:26 2018:  HW Type = LOOPBACK
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018:  Name = ens160
Thu Dec 13 12:10:26 2018:  index = 2
Thu Dec 13 12:10:26 2018:  IPv4 address = 10.52.41.100
Thu Dec 13 12:10:26 2018:  IPv6 address = fe80::250:56ff:fea1:6a2c
Thu Dec 13 12:10:26 2018:  MAC = 00:50:56:a1:6a:2c
Thu Dec 13 12:10:26 2018:  is UP
Thu Dec 13 12:10:26 2018:  is RUNNING
Thu Dec 13 12:10:26 2018:  MTU = 1500
Thu Dec 13 12:10:26 2018:  HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018:  Name = ens161
Thu Dec 13 12:10:26 2018:  index = 3
Thu Dec 13 12:10:26 2018:  IPv4 address = 10.52.42.100
Thu Dec 13 12:10:26 2018:  IPv6 address = fe80::250:56ff:fea1:7d07
Thu Dec 13 12:10:26 2018:  MAC = 00:50:56:a1:7d:07
Thu Dec 13 12:10:26 2018:  is UP
Thu Dec 13 12:10:26 2018:  is RUNNING
Thu Dec 13 12:10:26 2018:  MTU = 1500
Thu Dec 13 12:10:26 2018:  HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018:  Name = ens224
Thu Dec 13 12:10:26 2018:  index = 4
Thu Dec 13 12:10:26 2018:  IPv4 address = 10.52.40.100
Thu Dec 13 12:10:26 2018:  IPv6 address = fe80::250:56ff:fea1:236e
Thu Dec 13 12:10:26 2018:  MAC = 00:50:56:a1:23:6e
Thu Dec 13 12:10:26 2018:  is UP
Thu Dec 13 12:10:26 2018:  is RUNNING
Thu Dec 13 12:10:26 2018:  MTU = 1500
Thu Dec 13 12:10:26 2018:  HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018:  Name = ens256
Thu Dec 13 12:10:26 2018:  index = 5
Thu Dec 13 12:10:26 2018:  IPv4 address = 10.52.44.100
Thu Dec 13 12:10:26 2018:  IPv6 address = fe80::250:56ff:fea1:2012
Thu Dec 13 12:10:26 2018:  MAC = 00:50:56:a1:20:12
Thu Dec 13 12:10:26 2018:  is UP
Thu Dec 13 12:10:26 2018:  is RUNNING
Thu Dec 13 12:10:26 2018:  MTU = 1500
Thu Dec 13 12:10:26 2018:  HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: ------< NIC >------
Thu Dec 13 12:10:26 2018:  Name = docker0
Thu Dec 13 12:10:26 2018:  index = 6
Thu Dec 13 12:10:26 2018:  IPv4 address = 172.17.0.1
Thu Dec 13 12:10:26 2018:  IPv6 address = ::
Thu Dec 13 12:10:26 2018:  MAC = 02:42:b0:8a:93:e7
Thu Dec 13 12:10:26 2018:  is UP
Thu Dec 13 12:10:26 2018:  MTU = 1500
Thu Dec 13 12:10:26 2018:  HW Type = ETHERNET
Thu Dec 13 12:10:26 2018: Using LinkWatch kernel netlink reflector...
Thu Dec 13 12:10:26 2018: VRRP_Instance(kolla_internal_vip_148) Entering BACKUP STATE
Thu Dec 13 12:10:26 2018: /check_alive.sh exited with status 1
Thu Dec 13 12:10:28 2018: /check_alive.sh exited with status 1
Thu Dec 13 12:10:30 2018: VRRP_Instance(kolla_internal_vip_148) Now in FAULT state
Thu Dec 13 12:10:30 2018: /check_alive.sh exited with status 1
Thu Dec 13 12:10:32 2018: /check_alive.sh exited with status 1
[message repeats until I stop the container]

That's it, both keepalived instances stay in the FAULT state, the IP address is not activated on any of the VMs.

I went through this question and the answer, even though I don't have the error messages in the log files:

  • keepalived_virtual_router_id has been changed and is unique
  • I ran kolla-genpwd again. I confirmed that keepalived_password is set in /etc/kolla/passwords.yml
  • kolla_internal_vip_address is accessible from network_interface. The main IP on that interface is in the same network. I can manually set the additional IP address and it works.
  • kolla-ansible prechecks passes
  • selinux is not active on Ubuntu

On the hypervisor side I tried enabling Promiscuous mode for the port group of that interface. That didn't make a difference.

Questioner
Gerald Schneider
Viewed
11
Gerald Schneider 2019-02-15 15:28:02

So, after running into the same problem on bare metal I dug deeper into the problem. Turns out it wasn't keepalived, but the haproxy container that had the problem.

The haproxy container keeps restarting because haproxy is started with the command line parameter -W, which does not exist in the haproxy version that is shipped in the container.

Running command: '/usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid'
+ exec /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
HA-Proxy version 1.6.3 2015/12/25
Copyright 2000-2015 Willy Tarreau <willy@haproxy.org>

Usage : haproxy [-f <cfgfile>]* [ -vdVD ] [ -n <maxconn> ] [ -N <maxpconn> ]
        [ -p <pidfile> ] [ -m <max megs> ] [ -C <dir> ] [-- <cfgfile>*]
        -v displays version ; -vv shows known build options.
        -d enters debug mode ; -db only disables background mode.
        -dM[<byte>] poisons memory with <byte> (defaults to 0x50)
        -V enters verbose mode (disables quiet mode)
        -D goes daemon ; -C changes to <dir> before loading files.
        -q quiet mode : don't display messages
        -c check mode : only check config files and exit
        -n sets the maximum total # of connections (2000)
        -m limits the usable amount of memory (in MB)
        -N sets the default, per-proxy maximum # of connections (2000)
        -L set local peer name (default to hostname)
        -p writes pids of all children to this file
        -de disables epoll() usage even when available
        -dp disables poll() usage even when available
        -dS disables splice usage (broken on old kernels)
        -dV disables SSL verify on servers side
        -sf/-st [pid ]* finishes/terminates old pids.

Hence, the haproxy container keeps restarting. The keepalived container on the other hand, is configured with a check script for keepalived that keeps exiting with an error:

Fri Feb 15 08:17:14 2019: /check_alive.sh exited with status 1
Keepalived_vrrp[12]: /check_alive.sh exited with status 1

This check script is very simple, it checks the status of haproxy via a socket file:

#!/bin/bash

# This will return 0 when it successfully talks to the haproxy daemon via the socket
# Failures return 1

echo "show info" | socat unix-connect:/var/lib/kolla/haproxy/haproxy.sock stdio > /dev/null

So ... as long as haproxy is called with the invalid parameter and doesn't start, keepalived stays in FAULT state, with no floating IP up.

Using grep -R "haproxy -W" * I found that the command line for haproxy is defined in the file /usr/local/share/kolla-ansible/ansible/roles/haproxy/templates/haproxy.json.j2. I removed the -W parameter from the command line, which resulted in haproxy starting properly and keepalived changing to MASTER state with configuration of the floating IP.

There is already a bug report open on Launchpad regarding this issue. There is also a slightly different solution in the comments (changing the same file).

This modification will of course be reverted when the file is updated. If you have the same problem, please log into Launchpad and mark that the bug (which was reported on 2018-06-08) affects you, so it gets priority and gets fixed.