Hi all,
we have several customers that use large fileserver systems (100TB and above) with millions of files. They need the ability to search for deleted files over several restore points as their users often can't determine when files were deleted. I don't want to discuss if this is the right way to do things but I've seen competitors that handle this kind of "problem" much more elegant.
Just to get some discussion started, here are my observations and proposals.
I really love Veeam and I use VBR since v3 but in my opinion, the guest file indexing isn't enterprise ready. It's okay for smaller environments with "few" files on the indexed guest systems but not enough for large deployments.
- creating file indexes is quite time consuming. In former days where we ran on HDD SAN, guest file indexing took more than 24h. After migrating to AllFlash, time was reduced to 2h or less which is still longer than the runtime of the backup job itself but that's okay.
--> wouldn't it be better to do the guest file indexing as a post backup process? Mounting the backup and indexing it on the backupserver itself should reduce load on the primary target and reduces the time for the backup job. For restores, the backup has to be mounted as well so why do no use this functionality?
- having Windows dedup enabled on the fileserver volumes (often used by our customers) is neck breaking for guest file indexing. Veeam has to switch to a standard tree walk which is extremly time consuming.
--> probably not optimizeable but one have to know. There is only a simple hint that using dedup COULD lead to higher indexing times but it is dramatically how times will raise especially with large number of files. This should be mentioned a bit more clearly in the docs.
- saving the guest file indexes is some kind of 90's. Creating flat text files, compressing them and copying them over network to the VBRcatalog share isn't really state-of-the-art. If you use a separate Enterprise Manager VM (which we do for security reasons), the whole VBRcatalog is at least duplicated. With only 6 fileservers an 2 months of index data, the folder is nearly 1TB of size. Sure, we have BIG fileservers but as Veeam is Enterprise grade, this should be normal. Using malware detection will raise storage consumption additionally.
--> why not indexing data on the VM, transfer them over the Veeam components like the data mover and store the index data in databases? This should reduce footprint and could be better and more securely integrated as copying zip files over network to SMB shares.
- using the index data on the EM webinterface is a mess. With this big index files, searching for files inside mutilple jobs is constanly crashing with error "searcher not found" or extremely time consuming. Even if the specify the correct restore point, searching for files inside the index can take more than 1h. During this period, the EM VM with 8 vCPUs is completely at 100% load. Even if the search results are displayed, the GuestCatalog Service on the EM VM is still running on 100% for additional 10s of minutes, just like a cool down phase.
--> using databases with a much more efficient query language should reduce the load extremely. In the past, Veeam offered support for MOSS (MS search server). No idea why this support ended, perhaps MOSS isn't continued by MS but the idea behind was great.
These are only some of the observations we made with large customers. I just want to share my experience and perhaps others have the same problems and perhaps even a solution or workaround for this. It would be great if someone from Veeam could have a look at this topic and perhaps take it to the development to discuss improvements here.
Regards,
Oliver
we have several customers that use large fileserver systems (100TB and above) with millions of files. They need the ability to search for deleted files over several restore points as their users often can't determine when files were deleted. I don't want to discuss if this is the right way to do things but I've seen competitors that handle this kind of "problem" much more elegant.
Just to get some discussion started, here are my observations and proposals.
I really love Veeam and I use VBR since v3 but in my opinion, the guest file indexing isn't enterprise ready. It's okay for smaller environments with "few" files on the indexed guest systems but not enough for large deployments.
- creating file indexes is quite time consuming. In former days where we ran on HDD SAN, guest file indexing took more than 24h. After migrating to AllFlash, time was reduced to 2h or less which is still longer than the runtime of the backup job itself but that's okay.
--> wouldn't it be better to do the guest file indexing as a post backup process? Mounting the backup and indexing it on the backupserver itself should reduce load on the primary target and reduces the time for the backup job. For restores, the backup has to be mounted as well so why do no use this functionality?
- having Windows dedup enabled on the fileserver volumes (often used by our customers) is neck breaking for guest file indexing. Veeam has to switch to a standard tree walk which is extremly time consuming.
--> probably not optimizeable but one have to know. There is only a simple hint that using dedup COULD lead to higher indexing times but it is dramatically how times will raise especially with large number of files. This should be mentioned a bit more clearly in the docs.
- saving the guest file indexes is some kind of 90's. Creating flat text files, compressing them and copying them over network to the VBRcatalog share isn't really state-of-the-art. If you use a separate Enterprise Manager VM (which we do for security reasons), the whole VBRcatalog is at least duplicated. With only 6 fileservers an 2 months of index data, the folder is nearly 1TB of size. Sure, we have BIG fileservers but as Veeam is Enterprise grade, this should be normal. Using malware detection will raise storage consumption additionally.
--> why not indexing data on the VM, transfer them over the Veeam components like the data mover and store the index data in databases? This should reduce footprint and could be better and more securely integrated as copying zip files over network to SMB shares.
- using the index data on the EM webinterface is a mess. With this big index files, searching for files inside mutilple jobs is constanly crashing with error "searcher not found" or extremely time consuming. Even if the specify the correct restore point, searching for files inside the index can take more than 1h. During this period, the EM VM with 8 vCPUs is completely at 100% load. Even if the search results are displayed, the GuestCatalog Service on the EM VM is still running on 100% for additional 10s of minutes, just like a cool down phase.
--> using databases with a much more efficient query language should reduce the load extremely. In the past, Veeam offered support for MOSS (MS search server). No idea why this support ended, perhaps MOSS isn't continued by MS but the idea behind was great.
These are only some of the observations we made with large customers. I just want to share my experience and perhaps others have the same problems and perhaps even a solution or workaround for this. It would be great if someone from Veeam could have a look at this topic and perhaps take it to the development to discuss improvements here.
Regards,
Oliver
Statistics: Posted by okrehan — Aug 13, 2024 12:54 pm







