Problem description
Deployment Vmextensionprovisioningerror was encountered while using azure service fabric. The full text is as follows:
Deployment Name: 385A084F35B2BC55-xxxx Operation Id: 096CED Operation: Create Correlation Id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx Status: Failed Error: Code: VMExtensionProvisioningError Message: Multiple VM extensions failed to be provisioned on the VM. Please see the VM extension instance view for other failures. The first extension failed due to the error: VM Extension 'RunPowerShell_Frontdoor' is marked as failed since it depends upon the VM Extension 'ServiceFabricNodeVmExt_vmNodeType1Name' which has failed. Help Link: https://aka.ms/ev2/errors/troubleshooting
Problem solving
Step 1: collect logs
In the node that requires remote connection (RDP) to SF, in the directory C: Collect VM extension installation logs in \ windowsazure \ logs \ plugins
Step 2: analyze logs
stay In the microsoft.azure.servicefabric.servicefabricnode plugin log, it is found that the error occurred due to the failure to obtain the certificate.
[09/09/2021 12:32:01.86] Executing: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\ServiceFabricExtensionHandler.exe enable [09/09/2021 12:32:03.62] Execution Complete. ###### Execution Output: 1> 2021-09-09T12:32:01.9973303Z: Information: Starting... 1> 2021-09-09T12:32:01.9973303Z: Information: MachineName: Frontdoor000000 1> 2021-09-09T12:32:01.9973303Z: Information: CurrentDirectory: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11 1> 2021-09-09T12:32:01.9973303Z: Information: UserName: SYSTEM 1> 2021-09-09T12:32:01.9973303Z: Information: Created mutex; this is the only instance of this process running 1> 2021-09-09T12:32:02.3723445Z: Information: HandlerManifest Version: 1 1> 2021-09-09T12:32:02.4348520Z: Information: [HandlerEnvironment] 1> 2021-09-09T12:32:02.4348520Z: Information: LogFolder: C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11 1> 2021-09-09T12:32:02.4348520Z: Information: ConfigFolder: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\RuntimeSettings 1> 2021-09-09T12:32:02.4348520Z: Information: StatusFolder: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\Status 1> 2021-09-09T12:32:02.4348520Z: Information: HeartbeatFile: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\Status\HeartBeat.Json 1> 2021-09-09T12:32:02.4348520Z: Information: DeploymentId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 1> 2021-09-09T12:32:02.4348520Z: Information: RoleName: _Frontdoor_0 1> 2021-09-09T12:32:02.4348520Z: Information: Instance: _Frontdoor_0 1> 2021-09-09T12:32:02.5442327Z: Information: Opening config file: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\RuntimeSettings\1.settings 1> 2021-09-09T12:32:02.5598642Z: Information: Config specified client cert FindBySubjectName 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' (store name = My) 1> 2021-09-09T12:32:02.5754780Z: Error: ERROR: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy) at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting) 1> 2021-09-09T12:32:02.5754780Z: Warning: Heartbeat: Not Ready: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' 1> 2021-09-09T12:32:02.5754780Z: Error: Heartbeat file updated, with error msg 1> 2021-09-09T12:32:02.5911044Z: Error: ERROR: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy) at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting) at Microsoft.Azure.ServiceFabric.Extension.Core.Common.PublicSettingsExtensions.AllowAccessToCerts(PublicSettings publicSettings, ICertificateUtility certificateUtility) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.ValidateDeployment(HandlerSettings settings) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallService() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallServiceWithRetry() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.Enable() at Microsoft.Azure.ServiceFabric.Extension.Handler.Program.Main(String[] args) in X:\bt\1162492\repo\src\HandlerExe\Program.cs:line 52 Execution Error: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy) at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting) at Microsoft.Azure.ServiceFabric.Extension.Core.Common.PublicSettingsExtensions.AllowAccessToCerts(PublicSettings publicSettings, ICertificateUtility certificateUtility) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.ValidateDeployment(HandlerSettings settings) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallService() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallServiceWithRetry() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.Enable() at Microsoft.Azure.ServiceFabric.Extension.Handler.Program.Main(String[] args) in X:\bt\1162492\repo\src\HandlerExe\Program.cs:line 52
The key message is Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with ‘FindBySubjectName’ ‘xxxxx.cluster.prod.ngpproxy.microsoftonline.cn’ in store ‘My’ 。
Step 3: continue to find the problem that the certificate cannot be found
In the log of obtaining the certificate, it is found that SF is the certificate obtained from kV (key vault). When obtaining the auth challenge in step 5, it took more than 5 minutes from 18:59:02 to 19:04:45.
2021-09-09 18:59:02: < info> [CertificateManager] Beginning refresh for: https://xxxxx-chinanorth-kv.vault.azure.cn/secrets/cluster
2021-09-09 18:59:02: < info> [WindowsCertificateStore] attempting to open store ‘LocalMachine\MY’
2021-09-09 18:59:02: < debug> [WindowsCertificateStore] opening the ‘LocalMachine’ store..
2021-09-09 18:59:02: < debug> [WindowsCertificateStore] store opened successfully.
2021-09-09 18:59:02: < info> [KeyVaultClient] Getting new auth challenge
2021-09-09 19:04:45: < error> [CertificateManager] Refreshing ‘ https://xxxxx-chinanorth-kv.vault.azure.cn/secrets/cluster ‘ failed with KeyVault exception: KeyVaultHttpClient.GetChallenge. – http_ exception: what = ‘WinHttpSendRequest: 12030: The connection with the server was terminated abnormall’ code = ‘windows:12030’
2021-09-09 19:04:45: < info> [CertificateManager] Beginning refresh for:
It can be inferred that when SF is deployed due to network delay, authentication is requested from key vault, and deployment fails due to request timeout.
Redeployment of SF was successful after multiple retries.