Quantcast
Channel: R&D Forums
Viewing all articles
Browse latest Browse all 9990

Veeam Backup for Microsoft 365 • Postgresql inefficient query #2: JobSessions

$
0
0
Case #07802680 - first query

NOTE: Since this is a public forum the organization ID in the queries is replaced with 'REDACTED'

While looking at PostgreSQL log i found this statement, which i think is way more resource heavy than necessary and can be optimized by your developers:


duration: 11303.847

Code:

SELECT t.id, t.backup_scope, t.copy_backup_job_id, t.description, t.e_tag, t.is_deleted, t.is_enabled, t.is_upgraded, t.job_type, t.license_exceeded, t.migration_job_source_id, t.migration_job_source_type, t.migration_job_switch_job_to_target_repository, t.name, t.no_delete, t.orchestration_priority, t.organization_id, t.proxy_id, t.repository_id, t.resync, t.resync_timestamp, t.retrieve_restore_point_id, t.schedule_config, t.status, t.target_proxy_id, t.target_repository_id, t.upgrade_actions, t.id0, t.details, t.display_name, t.import_status, t.job_id, t.last_synchronized_at, t.policy_id, t.retention_settings, t.service_type, t.state, t.status0, t.id1, t.capacity, t.cleaner_schedule, t.description0, t.encryption_changing_failed, t.free, t.index_error, t.index_progress, t.indexing_session_id, t.indexing_state, t.is_indexed, t.is_out_of_order, t.is_out_of_sync, t.is_outdated, t.local_cache_retention, t.maintenance, t.name0, t.next_object_storage_encryption_key_id, t.object_storage_encryption_key_id, t.object_storage_id, t.organization_id0, t.out_of_order_reason, t.path, t.proxy_id0, t.proxy_pool_id, t.retention_period_type, t.retention_time, t.retention_type, t.type, t.used_space_in_bytes, t0.id, t0.details, t0.e_tag, t0.end_time, t0.friendly_organization_name, t0.is_disconnected, t0.items_count, t0.items_rate, t0.job_id, t0.job_name, t0.job_type, t0.job_will_be_retried, t0.last_modified, t0.license_exceeded, t0.location_id, t0.location_name, t0.location_type, t0.mailboxes_count, t0.mailboxes_processed, t0.main_session_id, t0.original_organization_name, t0.processed, t0.rate, t0.repository_data_rate, t0.repository_id, t0.repository_name, t0.retry, t0.source_data_rate, t0.start_time, t0.status, t0.typeFROM (    SELECT j.id, j.backup_scope, j.copy_backup_job_id, j.description, j.e_tag, j.is_deleted, j.is_enabled, j.is_upgraded, j.job_type, j.license_exceeded, j.migration_job_source_id, j.migration_job_source_type, j.migration_job_switch_job_to_target_repository, j.name, j.no_delete, j.orchestration_priority, j.organization_id, j.proxy_id, j.repository_id, j.resync, j.resync_timestamp, j.retrieve_restore_point_id, j.schedule_config, j.status, j.target_proxy_id, j.target_repository_id, j.upgrade_actions, j0.id AS id0, j0.details, j0.display_name, j0.import_status, j0.job_id, j0.last_synchronized_at, j0.policy_id, j0.retention_settings, j0.service_type, j0.state, j0.status AS status0, r.id AS id1, r.capacity, r.cleaner_schedule, r.description AS description0, r.encryption_changing_failed, r.free, r.index_error, r.index_progress, r.indexing_session_id, r.indexing_state, r.is_indexed, r.is_out_of_order, r.is_out_of_sync, r.is_outdated, r.local_cache_retention, r.maintenance, r.name AS name0, r.next_object_storage_encryption_key_id, r.object_storage_encryption_key_id, r.object_storage_id, r.organization_id AS organization_id0, r.out_of_order_reason, r.path, r.proxy_id AS proxy_id0, r.proxy_pool_id, r.retention_period_type, r.retention_time, r.retention_type, r.type, r.used_space_in_bytes    FROM jobs AS j    LEFT JOIN repositories AS r ON j.repository_id = r.id    LEFT JOIN job_m365backups AS j0 ON j.id = j0.job_id    WHERE j.organization_id = ANY ($1) AND j.job_type = ANY ($2) AND NOT (j.is_deleted)    ORDER BY j.id    LIMIT $3) AS tLEFT JOIN (    SELECT t1.id, t1.details, t1.e_tag, t1.end_time, t1.friendly_organization_name, t1.is_disconnected, t1.items_count, t1.items_rate, t1.job_id, t1.job_name, t1.job_type, t1.job_will_be_retried, t1.last_modified, t1.license_exceeded, t1.location_id, t1.location_name, t1.location_type, t1.mailboxes_count, t1.mailboxes_processed, t1.main_session_id, t1.original_organization_name, t1.processed, t1.rate, t1.repository_data_rate, t1.repository_id, t1.repository_name, t1.retry, t1.source_data_rate, t1.start_time, t1.status, t1.type    FROM (        SELECT j1.id, j1.details, j1.e_tag, j1.end_time, j1.friendly_organization_name, j1.is_disconnected, j1.items_count, j1.items_rate, j1.job_id, j1.job_name, j1.job_type, j1.job_will_be_retried, j1.last_modified, j1.license_exceeded, j1.location_id, j1.location_name, j1.location_type, j1.mailboxes_count, j1.mailboxes_processed, j1.main_session_id, j1.original_organization_name, j1.processed, j1.rate, j1.repository_data_rate, j1.repository_id, j1.repository_name, j1.retry, j1.source_data_rate, j1.start_time, j1.status, j1.type, ROW_NUMBER() OVER(PARTITION BY j1.job_id ORDER BY j1.end_time DESC) AS row        FROM job_sessions AS j1        WHERE j1.status <> 1    ) AS t1    WHERE t1.row <= 1) AS t0 ON t.id = t0.job_idORDER BY t.id



11.3 seconds for a query, and a query which by the initial looks at it reads almost the entire sessions table, partition over jobid to calculate the row numbers within the partition and first later in the left join looks at which data was actually necessary

If I rewrite the query to use the logged parameters and do a EXPLAIN ANALYZE it will look like this

Code:

EXPLAIN ANALYZE SELECT t.id, t.backup_scope, t.copy_backup_job_id, t.description, t.e_tag, t.is_deleted, t.is_enabled, t.is_upgraded, t.job_type, t.license_exceeded, t.migration_job_source_id, t.migration_job_source_type, t.migration_job_switch_job_to_target_repository, t.name, t.no_delete, t.orchestration_priority, t.organization_id, t.proxy_id, t.repository_id, t.resync, t.resync_timestamp, t.retrieve_restore_point_id, t.schedule_config, t.status, t.target_proxy_id, t.target_repository_id, t.upgrade_actions, t.id0, t.details, t.display_name, t.import_status, t.job_id, t.last_synchronized_at, t.policy_id, t.retention_settings, t.service_type, t.state, t.status0, t.id1, t.capacity, t.cleaner_schedule, t.description0, t.encryption_changing_failed, t.free, t.index_error, t.index_progress, t.indexing_session_id, t.indexing_state, t.is_indexed, t.is_out_of_order, t.is_out_of_sync, t.is_outdated, t.local_cache_retention, t.maintenance, t.name0, t.next_object_storage_encryption_key_id, t.object_storage_encryption_key_id, t.object_storage_id, t.organization_id0, t.out_of_order_reason, t.path, t.proxy_id0, t.proxy_pool_id, t.retention_period_type, t.retention_time, t.retention_type, t.type, t.used_space_in_bytes, t0.id, t0.details, t0.e_tag, t0.end_time, t0.friendly_organization_name, t0.is_disconnected, t0.items_count, t0.items_rate, t0.job_id, t0.job_name, t0.job_type, t0.job_will_be_retried, t0.last_modified, t0.license_exceeded, t0.location_id, t0.location_name, t0.location_type, t0.mailboxes_count, t0.mailboxes_processed, t0.main_session_id, t0.original_organization_name, t0.processed, t0.rate, t0.repository_data_rate, t0.repository_id, t0.repository_name, t0.retry, t0.source_data_rate, t0.start_time, t0.status, t0.typeFROM (    SELECT j.id, j.backup_scope, j.copy_backup_job_id, j.description, j.e_tag, j.is_deleted, j.is_enabled, j.is_upgraded, j.job_type, j.license_exceeded, j.migration_job_source_id, j.migration_job_source_type, j.migration_job_switch_job_to_target_repository, j.name, j.no_delete, j.orchestration_priority, j.organization_id, j.proxy_id, j.repository_id, j.resync, j.resync_timestamp, j.retrieve_restore_point_id, j.schedule_config, j.status, j.target_proxy_id, j.target_repository_id, j.upgrade_actions, j0.id AS id0, j0.details, j0.display_name, j0.import_status, j0.job_id, j0.last_synchronized_at, j0.policy_id, j0.retention_settings, j0.service_type, j0.state, j0.status AS status0, r.id AS id1, r.capacity, r.cleaner_schedule, r.description AS description0, r.encryption_changing_failed, r.free, r.index_error, r.index_progress, r.indexing_session_id, r.indexing_state, r.is_indexed, r.is_out_of_order, r.is_out_of_sync, r.is_outdated, r.local_cache_retention, r.maintenance, r.name AS name0, r.next_object_storage_encryption_key_id, r.object_storage_encryption_key_id, r.object_storage_id, r.organization_id AS organization_id0, r.out_of_order_reason, r.path, r.proxy_id AS proxy_id0, r.proxy_pool_id, r.retention_period_type, r.retention_time, r.retention_type, r.type, r.used_space_in_bytes    FROM jobs AS j    LEFT JOIN repositories AS r ON j.repository_id = r.id    LEFT JOIN job_m365backups AS j0 ON j.id = j0.job_id    WHERE j.organization_id = ANY (ARRAY['REDACTED']::uuid[]) AND j.job_type = ANY (ARRAY[0]) AND NOT (j.is_deleted)    ORDER BY j.id    LIMIT 10001) AS tLEFT JOIN (    SELECT t1.id, t1.details, t1.e_tag, t1.end_time, t1.friendly_organization_name, t1.is_disconnected, t1.items_count, t1.items_rate, t1.job_id, t1.job_name, t1.job_type, t1.job_will_be_retried, t1.last_modified, t1.license_exceeded, t1.location_id, t1.location_name, t1.location_type, t1.mailboxes_count, t1.mailboxes_processed, t1.main_session_id, t1.original_organization_name, t1.processed, t1.rate, t1.repository_data_rate, t1.repository_id, t1.repository_name, t1.retry, t1.source_data_rate, t1.start_time, t1.status, t1.type    FROM (        SELECT j1.id, j1.details, j1.e_tag, j1.end_time, j1.friendly_organization_name, j1.is_disconnected, j1.items_count, j1.items_rate, j1.job_id, j1.job_name, j1.job_type, j1.job_will_be_retried, j1.last_modified, j1.license_exceeded, j1.location_id, j1.location_name, j1.location_type, j1.mailboxes_count, j1.mailboxes_processed, j1.main_session_id, j1.original_organization_name, j1.processed, j1.rate, j1.repository_data_rate, j1.repository_id, j1.repository_name, j1.retry, j1.source_data_rate, j1.start_time, j1.status, j1.type, ROW_NUMBER() OVER(PARTITION BY j1.job_id ORDER BY j1.end_time DESC) AS row        FROM job_sessions AS j1        WHERE j1.status <> 1    ) AS t1    WHERE t1.row <= 1) AS t0 ON t.id = t0.job_idORDER BY t.id;


Which gives us this output

Code:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Left Join  (cost=2478.32..1198433.48 rows=10853 width=1292) (actual time=11930.835..11934.208 rows=1 loops=1)   Merge Cond: (j.id = j1.job_id)   ->  Limit  (cost=219.11..219.11 rows=1 width=936) (actual time=0.653..0.662 rows=1 loops=1)         ->  Sort  (cost=219.11..219.11 rows=1 width=936) (actual time=0.653..0.661 rows=1 loops=1)               Sort Key: j.id               Sort Method: quicksort  Memory: 26kB               ->  Nested Loop Left Join  (cost=203.08..219.10 rows=1 width=936) (actual time=0.640..0.649 rows=1 loops=1)                     ->  Hash Right Join  (cost=201.84..216.63 rows=1 width=571) (actual time=0.618..0.625 rows=1 loops=1)                           Hash Cond: (j0.job_id = j.id)                           ->  Seq Scan on job_m365backups j0  (cost=0.00..13.80 rows=380 width=184) (actual time=0.009..0.009 rows=0 loops=1)                           ->  Hash  (cost=201.82..201.82 rows=1 width=387) (actual time=0.597..0.599 rows=1 loops=1)                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB                                 ->  Seq Scan on jobs j  (cost=0.00..201.82 rows=1 width=387) (actual time=0.305..0.588 rows=1 loops=1)                                       Filter: ((NOT is_deleted) AND (organization_id = ANY ('{REDACTED}'::uuid[])) AND (job_type = ANY ('{0}'::integer[])))                                       Rows Removed by Filter: 1345                     ->  Bitmap Heap Scan on repositories r  (cost=1.25..2.36 rows=1 width=365) (actual time=0.015..0.016 rows=1 loops=1)                           Recheck Cond: (j.repository_id = id)                           Heap Blocks: exact=1                           ->  Bitmap Index Scan on pk_repositories  (cost=0.00..1.25 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)                                 Index Cond: (id = j.repository_id)   ->  Materialize  (cost=2259.21..1186330.69 rows=4710056 width=356) (actual time=30.046..11932.331 rows=1151 loops=1)         ->  WindowAgg  (cost=2259.21..1127454.99 rows=4710056 width=364) (actual time=30.038..11930.240 rows=1151 loops=1)               Run Condition: (row_number() OVER (?) <= 1)               ->  Incremental Sort  (cost=2259.21..1045029.01 rows=4710056 width=356) (actual time=30.022..11646.624 rows=4149598 loops=1)                     Sort Key: j1.job_id, j1.end_time DESC                     Presorted Key: j1.job_id                     Full-sort Groups: 373  Sort Method: quicksort  Average Memory: 51kB  Peak Memory: 51kB                     Pre-sorted Groups: 1148  Sort Method: quicksort  Average Memory: 10884kB  Peak Memory: 11229kB                     ->  Index Scan using ix_job_sessions_job_id_end_time_status on job_sessions j1  (cost=0.56..486028.86 rows=4710056 width=356) (actual time=0.024..7870.906 rows=4149661 loops=1)                           Filter: (status <> 1) Planning Time: 1.044 ms Execution Time: 11934.590 ms

From which it's clear that it actually does return 4.1 million rows from job_sessions and that part with read and enumerate linie number accounts for almost the entire duration. So a very heavy operation :(

If you rewrote that bit to 2 separate queries - then the first part where we find relevant Jobdata from jobs table

Code:

EXPLAIN ANALYZE    SELECT j.id, j.backup_scope, j.copy_backup_job_id, j.description, j.e_tag, j.is_deleted, j.is_enabled, j.is_upgraded, j.job_type, j.license_exceeded, j.migration_job_source_id, j.migration_job_source_type, j.migration_job_switch_job_to_target_repository, j.name, j.no_delete, j.orchestration_priority, j.organization_id, j.proxy_id, j.repository_id, j.resync, j.resync_timestamp, j.retrieve_restore_point_id, j.schedule_config, j.status, j.target_proxy_id, j.target_repository_id, j.upgrade_actions, j0.id AS id0, j0.details, j0.display_name, j0.import_status, j0.job_id, j0.last_synchronized_at, j0.policy_id, j0.retention_settings, j0.service_type, j0.state, j0.status AS status0, r.id AS id1, r.capacity, r.cleaner_schedule, r.description AS description0, r.encryption_changing_failed, r.free, r.index_error, r.index_progress, r.indexing_session_id, r.indexing_state, r.is_indexed, r.is_out_of_order, r.is_out_of_sync, r.is_outdated, r.local_cache_retention, r.maintenance, r.name AS name0, r.next_object_storage_encryption_key_id, r.object_storage_encryption_key_id, r.object_storage_id, r.organization_id AS organization_id0, r.out_of_order_reason, r.path, r.proxy_id AS proxy_id0, r.proxy_pool_id, r.retention_period_type, r.retention_time, r.retention_type, r.type, r.used_space_in_bytes    FROM jobs AS j    LEFT JOIN repositories AS r ON j.repository_id = r.id    LEFT JOIN job_m365backups AS j0 ON j.id = j0.job_id    WHERE j.organization_id = ANY (ARRAY['REDACTED']::uuid[]) AND j.job_type = ANY (ARRAY[0]) AND NOT (j.is_deleted)    ORDER BY j.id    LIMIT 10001----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit  (cost=219.11..219.11 rows=1 width=936) (actual time=0.656..0.658 rows=1 loops=1)   ->  Sort  (cost=219.11..219.11 rows=1 width=936) (actual time=0.656..0.657 rows=1 loops=1)         Sort Key: j.id         Sort Method: quicksort  Memory: 26kB         ->  Nested Loop Left Join  (cost=203.08..219.10 rows=1 width=936) (actual time=0.642..0.646 rows=1 loops=1)               ->  Hash Right Join  (cost=201.84..216.63 rows=1 width=571) (actual time=0.614..0.617 rows=1 loops=1)                     Hash Cond: (j0.job_id = j.id)                     ->  Seq Scan on job_m365backups j0  (cost=0.00..13.80 rows=380 width=184) (actual time=0.011..0.011 rows=0 loops=1)                     ->  Hash  (cost=201.82..201.82 rows=1 width=387) (actual time=0.588..0.588 rows=1 loops=1)                           Buckets: 1024  Batches: 1  Memory Usage: 9kB                           ->  Seq Scan on jobs j  (cost=0.00..201.82 rows=1 width=387) (actual time=0.298..0.574 rows=1 loops=1)                                 Filter: ((NOT is_deleted) AND (organization_id = ANY ('{REDACTED}'::uuid[])) AND (job_type = ANY ('{0}'::integer[])))                                 Rows Removed by Filter: 1345               ->  Bitmap Heap Scan on repositories r  (cost=1.25..2.36 rows=1 width=365) (actual time=0.019..0.019 rows=1 loops=1)                     Recheck Cond: (j.repository_id = id)                     Heap Blocks: exact=1                     ->  Bitmap Index Scan on pk_repositories  (cost=0.00..1.25 rows=1 width=0) (actual time=0.012..0.013 rows=1 loops=1)                           Index Cond: (id = j.repository_id) Planning Time: 0.334 ms Execution Time: 0.803 ms
We can see that the initial part is actually very fast, and for this particular case it only returned a single row (query modified to just return the id)

Code:

VeeamBackup365=# SELECT j.idVeeamBackup365-#     FROM jobs AS jVeeamBackup365-#     LEFT JOIN repositories AS r ON j.repository_id = r.idVeeamBackup365-#     LEFT JOIN job_m365backups AS j0 ON j.id = j0.job_idVeeamBackup365-#     WHERE j.organization_id = ANY (ARRAY['REDACTED']::uuid[]) AND j.job_type = ANY (ARRAY[0]) AND NOT (j.is_deleted)VeeamBackup365-#     ORDER BY j.idVeeamBackup365-#     LIMIT 10001;                  id-------------------------------------- db5b92ac-1b08-4dd9-aeaf-f49be51f23d8(1 row)



If I now update the 2nd part of the query to accept look at a array of job ID's (although this case only has a single value in the array)

Code:

EXPLAIN ANALYZE    SELECT t1.id, t1.details, t1.e_tag, t1.end_time, t1.friendly_organization_name, t1.is_disconnected, t1.items_count, t1.items_rate, t1.job_id, t1.job_name, t1.job_type, t1.job_will_be_retried, t1.last_modified, t1.license_exceeded, t1.location_id, t1.location_name, t1.location_type, t1.mailboxes_count, t1.mailboxes_processed, t1.main_session_id, t1.original_organization_name, t1.processed, t1.rate, t1.repository_data_rate, t1.repository_id, t1.repository_name, t1.retry, t1.source_data_rate, t1.start_time, t1.status, t1.type    FROM (        SELECT j1.id, j1.details, j1.e_tag, j1.end_time, j1.friendly_organization_name, j1.is_disconnected, j1.items_count, j1.items_rate, j1.job_id, j1.job_name, j1.job_type, j1.job_will_be_retried, j1.last_modified, j1.license_exceeded, j1.location_id, j1.location_name, j1.location_type, j1.mailboxes_count, j1.mailboxes_processed, j1.main_session_id, j1.original_organization_name, j1.processed, j1.rate, j1.repository_data_rate, j1.repository_id, j1.repository_name, j1.retry, j1.source_data_rate, j1.start_time, j1.status, j1.type, ROW_NUMBER() OVER(PARTITION BY j1.job_id ORDER BY j1.end_time DESC) AS row        FROM job_sessions AS j1                        WHERE j1.status <> 1                AND j1.job_id = ANY (ARRAY['db5b92ac-1b08-4dd9-aeaf-f49be51f23d8']::uuid[])  --- My addition to the query    ) AS t1     WHERE t1.row <= 1;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan on t1  (cost=25.24..11101.64 rows=9259 width=356) (actual time=39.171..41.361 rows=1 loops=1)   ->  WindowAgg  (cost=25.24..11009.05 rows=9259 width=364) (actual time=39.169..41.356 rows=1 loops=1)         Run Condition: (row_number() OVER (?) <= 1)         ->  Incremental Sort  (cost=25.24..10847.02 rows=9259 width=356) (actual time=39.144..40.457 rows=12745 loops=1)               Sort Key: j1.job_id, j1.end_time DESC               Presorted Key: j1.job_id               Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 52kB  Peak Memory: 52kB               Pre-sorted Groups: 1  Sort Method: quicksort  Average Memory: 5791kB  Peak Memory: 5791kB               ->  Index Scan using ix_job_sessions_job_id_end_time_status on job_sessions j1  (cost=0.56..10363.81 rows=9259 width=356) (actual time=0.050..28.708 rows=12745 loops=1)                     Index Cond: (job_id = ANY ('{db5b92ac-1b08-4dd9-aeaf-f49be51f23d8}'::uuid[]))                     Filter: (status <> 1) Planning Time: 0.262 ms Execution Time: 42.332 ms

This will result in zero or one row pr jobid that you then would need to stitch together with the result from first part - but as you can see both queries now run in less than a second - and uses way lesser CPU and memory.
And stitching the 2 results together in your dotnet code, will be a trivial task

Statistics: Posted by hoerup — Sep 04, 2025 12:59 pm



Viewing all articles
Browse latest Browse all 9990

Latest Images

Trending Articles



Latest Images