Last week I updated a small infrastructure composed by two vsphere nodes and a vcenter from vsphere 5.5 to vsphere 6 going to the vcenter in the appliance format.
The customer had not provided a DNS, and then we tried to work around the problem by using the hosts file of the vcenter and the two esxi hosts (it was almost a test, because in the latest versions of vsphere, a DNS is required!).
I update everything, I add the hosts to the new vCenter, I do some test and everything seems okay. The strange thing is that after about 24/48 hours of normal operation this error started to appear in the summary tab of the hosts:
Agent can’t send heartbeats: Host is down
In fact everything seems to work correctly, the hosts are reachable and manageable, but obviously that's not exactly the best to have two hosts with a red error (I have not verified if HA worked correctly).
I've done some ping tests from the hosts and from vcenter to see if all the objects solved correctly the names of the others via the hosts file, and everything was correct. I tried to give a “reconfigure for HA” on the nodes but the error remained, I also tried to restart the services on the nodes (services.sh restart after temporarily disabling the ha) but after a few minutes the error has reappeared.
At the end to resolve, given that there was no misconfigurations and knowing that the problem could just be the fact that we had used the hosts file in the absence of a dns, I asked the customer to set up a DNS that would resolve, at least, the name of the two hosts and the vCenter, removed the two hosts from the vCenter (Disconnect an then Remove) and then re-added them the error is not longer present.