Last week I updated a small infrastructure composed by two vsphere nodes and a vcenter from vsphere 5.5 to vsphere 6 going to the vcenter in the appliance format.
The customer had not provided a DNS, and then we tried to work around the problem by using the hosts file of the vcenter and the two esxi hosts (it was almost a test, because in the latest versions of vsphere, a DNS is required!).
I update everything, I add the hosts to the new vCenter, I do some test and everything seems okay. The strange thing is that after about 24/48 hours of normal operation this error started to appear in the summary tab of the hosts:
Configuration issues
Agent can’t send heartbeats: Host is down
In fact everything seems to work correctly, the hosts are reachable and manageable, but obviously that's not exactly the best to have two hosts with a red error (I have not verified if HA worked correctly).
I've done some ping tests from the hosts and from vcenter to see if all the objects solved correctly the names of the others via the hosts file, and everything was correct. I tried to give a “reconfigure for HA” on the nodes but the error remained, I also tried to restart the services on the nodes (services.sh restart after temporarily disabling the ha) but after a few minutes the error has reappeared.
At the end to resolve, given that there was no misconfigurations and knowing that the problem could just be the fact that we had used the hosts file in the absence of a dns, I asked the customer to set up a DNS that would resolve, at least, the name of the two hosts and the vCenter, removed the two hosts from the vCenter (Disconnect an then Remove) and then re-added them the error is not longer present.
same issue here with fresh set up 3 hosts vsphere6 landscape
Did you solved by using a dns?
remove the host from the cluster and re add it
If the service agent of an esxi is down, what will be its status in vsphere client?
In this particular case despite the error the service was up and running (this is why it was a strange behaviour); if the service agent goes down you are not able to manage the esxi, you have to restart it in some way (for example from console).
I had the same issue, and was able to resolve it by 1. turn off HA 2. disconnect the host. 3. reconnect the host. 4 turn HA back on for the cluster.
Same problem here – in our case it turned out to be caused by old data in the hosts file on each of the hosts. When we were on a Windows vCenter we had multiple IP’s (1 for inbound RDP sessions / management traffic and 1 on the same vlan as the hosts’ Service consoles). When you migrate to a VCSA you can only use a single IP which is the one in DNS.