Tag Archives: Error

VMware STS Clients Failed SSL Certificate of STS Service Cannot Be Verified

“Initialization of STS Clients failed. Root Cause: The SSL certificate of STS service cannot be verified” is an error which put a delay in deployment of the vShield Manager.

VMware STS Clients Failed Error

During the configuration of the Lookup Service Information, we encountered this particular error. It important to understand how the environment was designed when we hit this error and why it didn’t seem to make sense at first .

There are two sites, Site A and Site B, in a hybrid vCenter 5.1 configuration running vCenter 5.5 Single Sign-On and Web Client on their own dedicated virtual machines, SSO1 and SSO2. vCenter 5.5 Single Sign-On and the Web Client both reside on the same server, one in each site. There are a total of 5 vCenter Servers that are at 5.1 U1/U2 versions. Each vCenter is pointed at their corresponding site/geographic regions’ vCenter 5.5 Single Sign-On and Web Client server.

VMware Single Sign-On SSO Architecture

This model is fully supported by VMware per KB2059249 and has proven to be an ideal deployment model in the vCenter 5.1 product family than the initial release of Single Sign-On 5.1.

The vShield Manager was deployed at Site B and we used Site B’s SSO and Web Server address when configuring the Lookup Service. After research, internet forums indicated that the certificate of the SSO server, chain and root certificates needed to be bundled into a single certificate and installed on the STS server. This did not make sense since no certificates were manually generated for use by the SSO servers. All SSO certificates were generated during installation and we’re self signed by the VMware SSO installer.

VMware STS Clients Failed Error

While working with a co-worker to troubleshoot the issue above, it occurred to me to list all services that the SSO server see’s to determine what STS service that the SSO server was using. After issuing the following command on the SSO server:

ssolscli listServices https://cgvccore2.fqdn:7444/lookupservice/sdk

Output:

VMware STS Clients Failed Error Proof

The urn:sso:sts service was listed with Site A’s registered URL! It completely slipped my mind that there was only one STS server listed in any SSO instance. We updated the Lookup Service Information Host URL and the “Initialization of STS Clients failed. Root Cause: The SSL certificate of STS service cannot be verified” issue was resolved!

VMware STS Clients Failed Error Resolved

Note: This is single point of failure, it would be best to load balance the STS service. There are articles to update where the STS service is pointing to the event of a failure if a load balance model is not implemented initially.

Advertisements
Tagged , , , , , ,

EMC PowerPath Internal Error Migrations May Be Pending Fix

A host side migration between arrays can be a nerve racking task especially when you come across issues. Data loss is a constant fear in the back of your mind and what is your fail-back plan should you need to execute it. During a PowerPath migration, I learned the hard way that a host side copy of the boot-from-san lun is NOT supported. After setting up the migration and upon the sync command the Windows machine froze to a halt until it went offline.

After troubleshooting it was clear that the EMC PowerPath Migration Enabler Service needed to be disabled for the Windows machine to fully boot. After enabling EMC PowerPath Migration Enabler after the host was booted would immediately cause the Windows host to go unresponsive and hard power cycle was the only fix.

I could not start the PowerPath Migration Enabler service to abort the session since it would immediately freeze the server and secondly I was unable to uninstall PowerPath Migration Enabler since there was a session pending. I was in a pickle!

EMC PowerPath PPME Removal Migration Pending

After a support ticket with EMC, the resolution requires you to manually remove the PowerPath Migration Enabler database and keys within the registry. After preforming a few deletions then you will be able to star the service successfully without freezing your server and with no active sessions going.

  1. Delete the UMD by deleting the files from C:\Program Files\EMC\PPME\db*.* 
  2. Delete the all subkeys with Prefix “dm_” EXCEPT for dev_conf under, HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\EmcPowerPath\KMD_*.
    The Keys would be dm_ac, dm_control_io_to_clones, dm_funnel_io, dm_wc.EMC PowerPath PPME Removal Migration Pending_Registry
  3. Reboot.

 

Tagged , , , ,

Failed to Start Migration Pre-copy Error 0xbad003f vMotion Migration Fix

“A general system error occurred: Failed to start migration pre-copy. Error 0xbad003f. Connection closed by remote host, possibly due to timeout.”
“A general system error occurred: Failed to start migration pre-copy. Error 0xbad004b. Connection reset by peer.”

Another issue, that I recently came across was a live vMotion issue where the vMotion migration would fail during the pre-copy and always at 10%. The following issues were either one of the two:

VMware vCenter vSphere Event Log

I performed some basic troubleshooting such as a vmkping. I used the ping command and watched the response times remain consistent during the attempted vMotion migration. No packets were being lost which I thought that there would be packet loss if there was an issue with Layer 3 IP addressing.

VMware ESXi vmkping

While still on the command line with the ESXi host, I decided to look for any arp entries anyways regardless of my logic to rule it out. I ran the following:

cat /var/log/vmkernel | grep arp

I was wrong, there was another host on the network that had the same IP address!

VMware ESXi Log

I found a new IP address for my VMKernel, updated DNS then updated the IP address on the ESXi host and my issue was resolved!

Tagged , , , , , ,