Maybe, this behavior goes back to the fact that part of my cluster went through an upgrade towards Release 3.5 two years ago, whereas some new hosts were added to it at the same time without being upgraded (this is also mentioned into the abovementioned KB article). The strange fact about my cluster seems to be that only one out of 9 hosts tries to retrieve the value of the field of its "/etc/opt/vmware/vpxa/hosts.cfg" file from the "vCenter Server Managed IP" field of the vSphere Client (the other hosts maintain the correct IP address for the VC system in their "vpxa.cfg" independently of the configuration of the vSphere client and the VC system). I experimented by entering the correct IP address of the VC system directly into the address of the "/etc/opt/vmware/vpxa/vpxa.cfg" file on the disconnected ESX host - however, as soon as I re-connected the host to VC, the correct value of the IP address was replaced by "127.0.0.1" and the host was - once again - disconnected from VC. In my environment, this field has always been empty. The article, mentioned above, states that the IP address of the VC system should be entered in the "vCenter Server Managed IP" field of the Runtime Settings view of the vCenter Server Settings dialog of the vSphere Client. Because of that incorrect IP address, the disconnected host sends its heartbeat to itself (instead of the Virtual Center system) and, therefore, VC considers it "dead" and disconnects it. I checked the value of that field in the "nf" files of all the other ESX hosts and I verified that, on the "healthy" ESX hosts, it contains the correct IP address. In the meantime, I dug "ultra-deep" into the various log and configuration files of my hosts and I believe that I have discovered why that particular host gets disconnected (I was hinted by this VMWARE KB article: jsessionid=3EF75AC47C91118D42A44F26F15216F4?micr.): on each ESX host there is a file "/etc/opt/vmware/vpxa/nf" (that regulates the connection of that host to the vCenter system) and the field of that file on the disconnected host is equal to "127.0.0.1", whereas it should contain the actual IP address of the Virtual Center system (in my environment this is "192.168.5.64") . I have carried out the test, that I have described earlier in this discussion (move one or more virtual guests from the "healthy" ESX hosts to the disconnected one via un-register/re-register) and I now know beyond any doubt that the disconnected host is running absolutely fine "on-its-own" (it just doesn't get managed by the Virtual Center system). I am attaching a small sample of these logs to this message, so that other people reading this discussion have a chance to look at it ("" is the hostname of the system in question). However, I have noticed that I can connect the vSphere client directly to the disconnected host and, then again, the host seems to be OK - so, later today (when computing loads get lower) I am going to un-register a less-mission-critical (test-and-development) system from a "healthy" host and re-register it into the disconnected one and see what happens.Īlso, I have dug into the logs of the Virtual-Center server and I have noticed that there are several messages stating "Marked (hostname) as dirty" (which may relate to my problem, since no such messages are generated for any other host of my cluster). Restarting the VMWARE management services via command-line on the server ("service mgmt-vmware restart" and "service vmware-vpxa restart") did not resolve the problem.Īdvice is always welcome - thanks in advance.Īctually, while the host is running fine, there are no running VM's on it - so, I have no way of knowing whether I would really be able to connect to a virtual guest running on this host. Same thing happens whenever I re-connect that server to the cluster (by right-clicking on its entity in the vSphere client and selecting "Reconnect"). Whenever I reboot that server, it seems to be connected to the cluster ("Connected" state and "Normal" status) for a few seconds (less than 1 minute) and then goes into "Not responding" state. During the last few weeks, I have a problem in our VMWARE ESX cluster (Rel.3.5, build 207095): 1 of the 9 servers appears as disconnected in the vSphere client of our vCenter Server (both: Rel.4.0.0, build 258672), while it seems to have a normal operation (the ESX Server console is displayed on the system's screen, it accepts login connections and any Linux command via SSH and it seems to have access to all the VMFS volumes on the shared storage engines).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |