Hi!
Thanks - you were right, it had something to do with the plugins, as after another restart only VMware and Hyper-V were left, all the other options were missing. That kb article helped us further troubleshoot the issue.
The underlying reason for the plugins failing was different than in the kb though. It was because of issues with the Proxmox node certificates. The certificates are issued by a trusted internal CA on our airgapped network. We had initially forgotten to add the trusted root certificate for the internal CA to the VBR server. We're assuming that the certificates were initially accepted when the node was added manually (this seems confirmed by further testing, see below).
Secondly, our certificates have a very short lifetime and are renewed every couple of days. It seems only the initial certificate certificates in use when the nodes were added was trusted without a CRL. When these were automatically renewed with new certificates, VBR panicked, failing the plugins and removed the nodes from the console as well as prevented the plugins from starting.
The trusted CA was added to the VBR server, allowing the certificates to be trusted in windows. This brought back the nodes in the console and the plugins are now starting, allowing new servers to be added including proxmox etc.
However, the nodes are still not working and the logs keep recording errors. No VMs became visible under the nodes. Trying to do rescan on the node in the console, gives the error "Failed to rescan the Proxmox VE server.".
This lead us to the second issue. Even with certificates and certificate chains trusted by Windows, VBR also requires the certificates to have a designated CRL. Our certificates do not have a CRL (our network is airgapped, we have an internal CA that issues certificates with a very limited timespan). So the VBR console would still not populate the nodes with VMs or allow any backups to proceed. Note that the certificates are trusted in Windows, Edge, Chrome etc. without a CRL.
For reference and searches, the Veeam.PVE.Platform.Svc.log contains repeated instances of:And this fails rescan and other connections.
Selecting Properties on the node and proceeding passed the Credentials page gives a Certificate Security Alert, where it's possible to view the certificate, continue or cancel. View shows the certificate trusted by Windows. If we continue and then apply the snapshot storage settings, the Console refreshes the node and the VMs are listed again. This must have been the same behavior as when the nodes were initially added, adding the certificates on a trusted list (but is unfortunately not feasible for us to do every time the certificate is renewed for each node).
After doing this it is then possible to run rescan and work with the node again without issue. Forcing a certificate renewal on a node that had gone through the workaround steps above, makes it no longer trusted with a rescan giving the same "Failed to rescan the Proxmox VE server." error.
Based on this, we have two Veeam requests.
1. More graceful handling of certificate issues with a node. It would be better if the user is notified that there is a certificate issue instead of the plugins die, the nodes disappear and the Add server dialog is almost empty..![Smile :)]()
2. Allow trusted certificates, with a complete certificate chain but without a CRL.
On our side, before thinking about taking this into production, we will have to initiate a project to rebuild our internal airgapped CA to provide CRLs going forward. For us, it means that the Veeam/Proxmox project will have to be postponed until after the CA upgrade project.
Thanks - you were right, it had something to do with the plugins, as after another restart only VMware and Hyper-V were left, all the other options were missing. That kb article helped us further troubleshoot the issue.
The underlying reason for the plugins failing was different than in the kb though. It was because of issues with the Proxmox node certificates. The certificates are issued by a trusted internal CA on our airgapped network. We had initially forgotten to add the trusted root certificate for the internal CA to the VBR server. We're assuming that the certificates were initially accepted when the node was added manually (this seems confirmed by further testing, see below).
Secondly, our certificates have a very short lifetime and are renewed every couple of days. It seems only the initial certificate certificates in use when the nodes were added was trusted without a CRL. When these were automatically renewed with new certificates, VBR panicked, failing the plugins and removed the nodes from the console as well as prevented the plugins from starting.
The trusted CA was added to the VBR server, allowing the certificates to be trusted in windows. This brought back the nodes in the console and the plugins are now starting, allowing new servers to be added including proxmox etc.
However, the nodes are still not working and the logs keep recording errors. No VMs became visible under the nodes. Trying to do rescan on the node in the console, gives the error "Failed to rescan the Proxmox VE server.".
This lead us to the second issue. Even with certificates and certificate chains trusted by Windows, VBR also requires the certificates to have a designated CRL. Our certificates do not have a CRL (our network is airgapped, we have an internal CA that issues certificates with a very limited timespan). So the VBR console would still not populate the nodes with VMs or allow any backups to proceed. Note that the certificates are trusted in Windows, Edge, Chrome etc. without a CRL.
For reference and searches, the Veeam.PVE.Platform.Svc.log contains repeated instances of:
Code:
[CertificateValidationUtils]: Certificate summary information: Certificate errors: None Certificate chain errors:[CertificateValidationUtils]: Remote certificate <thumbnail> has the following validation issues[CertificateValidationUtils]: The revocation function was unable to check revocation for the certificate.Selecting Properties on the node and proceeding passed the Credentials page gives a Certificate Security Alert, where it's possible to view the certificate, continue or cancel. View shows the certificate trusted by Windows. If we continue and then apply the snapshot storage settings, the Console refreshes the node and the VMs are listed again. This must have been the same behavior as when the nodes were initially added, adding the certificates on a trusted list (but is unfortunately not feasible for us to do every time the certificate is renewed for each node).
After doing this it is then possible to run rescan and work with the node again without issue. Forcing a certificate renewal on a node that had gone through the workaround steps above, makes it no longer trusted with a rescan giving the same "Failed to rescan the Proxmox VE server." error.
Based on this, we have two Veeam requests.
1. More graceful handling of certificate issues with a node. It would be better if the user is notified that there is a certificate issue instead of the plugins die, the nodes disappear and the Add server dialog is almost empty..
2. Allow trusted certificates, with a complete certificate chain but without a CRL.
On our side, before thinking about taking this into production, we will have to initiate a project to rebuild our internal airgapped CA to provide CRLs going forward. For us, it means that the Veeam/Proxmox project will have to be postponed until after the CA upgrade project.
Statistics: Posted by crowsprofiles — Nov 29, 2024 3:57 pm






