-
Notifications
You must be signed in to change notification settings - Fork 14
Speed up the capture set list view #1301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mihow
wants to merge
10
commits into
main
Choose a base branch
from
perf/sourceimagecollection-cached-counts
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 4 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
83450a9
perf(api): denormalize SourceImageCollection counts as cached columns
mihow 457e090
refactor(models): introduce CachedCountField marker, apply to all den…
mihow 2725599
chore: drop tautological count-default test, clarify reactive comments
mihow 1a242aa
refactor(counts): async signals + cached_counts integrity check
mihow 451c8be
docs: cached-counts update_cached_counts method design
mihow 9ac2ce1
refactor(counts): unify cached-count refresh as BaseModel/BaseQuerySe…
mihow 1120396
fix(serializers): mark SourceImageCollection cached counts read-only
mihow 5113ff2
refactor(migrations): split SourceImageCollection counts schema and b…
mihow f0cc055
refactor(counts): defer reconcile tooling to follow-up
mihow 2533857
docs(planning): cached-counts reconcile follow-up + design space
mihow File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| """Reconcile drift on ``CachedCountField`` columns. | ||
|
|
||
| Cached count columns (e.g. ``Deployment.captures_count``, | ||
| ``SourceImageCollection.source_images_count``) are kept current via signals | ||
| and explicit ``update_calculated_fields`` calls. Bulk write paths that skip | ||
| signals — ``bulk_create``, ``bulk_update``, raw SQL, some ML post-processors | ||
| — silently drift the stored value out of sync with the underlying rows. | ||
|
|
||
| This check discovers every model that declares one or more | ||
| ``CachedCountField`` columns, finds rows whose stored values disagree with | ||
| a fresh recompute via ``instance.update_calculated_fields(save=False)``, | ||
| and either reports or repairs them. | ||
|
|
||
| Run via ``manage.py check_data_integrity`` (when PR #1188 lands) or via | ||
| the ``reconcile_cached_counts`` Celery task. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| from collections.abc import Iterator | ||
|
|
||
| from django.apps import apps | ||
|
|
||
| from ami.base.models import BaseModel, CachedCountField | ||
| from ami.main.checks.schemas import IntegrityCheckResult | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _cached_count_field_names(model: type[BaseModel]) -> list[str]: | ||
| fields = model._meta.get_fields() # type: ignore[attr-defined] | ||
| return [f.name for f in fields if isinstance(f, CachedCountField) and f.name] | ||
|
|
||
|
|
||
| def discover_cached_count_fields() -> dict[type[BaseModel], list[str]]: | ||
| """Return models that declare one or more ``CachedCountField`` columns.""" | ||
| result: dict[type[BaseModel], list[str]] = {} | ||
| for model in apps.get_models(): | ||
| if not issubclass(model, BaseModel): | ||
| continue | ||
| cached = _cached_count_field_names(model) | ||
| if cached: | ||
| result[model] = cached | ||
| return result | ||
|
|
||
|
|
||
| def _scope_to_project(qs, model: type[BaseModel], project_id: int | None): | ||
| if project_id is None: | ||
| return qs | ||
| project_accessor = model.get_project_accessor() | ||
| if project_accessor and project_accessor != "projects": | ||
| return qs.filter(**{f"{project_accessor}_id": project_id}) | ||
| return qs | ||
|
|
||
|
|
||
| def find_stale_cached_counts( | ||
| model: type[BaseModel], | ||
| project_id: int | None = None, | ||
| ) -> Iterator[tuple[BaseModel, dict[str, int | None], dict[str, int | None]]]: | ||
| """Yield ``(instance, stored, computed)`` for rows whose cached counts drift. | ||
|
|
||
| Iterates the queryset row-by-row and calls ``update_calculated_fields(save=False)`` | ||
| on a fresh copy so the stored row stays untouched. Heavy on large tables; | ||
| callers should scope by ``project_id`` whenever the check is interactive. | ||
| """ | ||
| cached_fields = _cached_count_field_names(model) | ||
| if not cached_fields: | ||
| return | ||
| qs = _scope_to_project(model.objects.all(), model, project_id) | ||
| for instance in qs.iterator(): | ||
| stored = {f: getattr(instance, f) for f in cached_fields} | ||
| instance.update_calculated_fields(save=False) | ||
| computed = {f: getattr(instance, f) for f in cached_fields} | ||
| if stored != computed: | ||
| yield instance, stored, computed | ||
|
|
||
|
|
||
| def reconcile_cached_counts( | ||
| model: type[BaseModel] | None = None, | ||
| project_id: int | None = None, | ||
| dry_run: bool = True, | ||
| ) -> IntegrityCheckResult: | ||
| """Repair stale cached counts. Pass ``model=None`` to sweep all models.""" | ||
| models_to_check = [model] if model else list(discover_cached_count_fields().keys()) | ||
| result = IntegrityCheckResult() | ||
| for m in models_to_check: | ||
| for instance, stored, computed in find_stale_cached_counts(m, project_id=project_id): | ||
| result.checked += 1 | ||
|
|
||
| logger.info( | ||
| "Stale cached counts on %s pk=%s: stored=%s computed=%s", | ||
| m.__name__, | ||
| instance.pk, | ||
| stored, | ||
| computed, | ||
| ) | ||
| if dry_run: | ||
| continue | ||
| try: | ||
| instance.update_calculated_fields(save=True) | ||
| result.fixed += 1 | ||
| except Exception: | ||
| logger.exception("Failed to reconcile %s pk=%s", m.__name__, instance.pk) | ||
| result.unfixable += 1 | ||
| return result | ||
89 changes: 89 additions & 0 deletions
89
ami/main/migrations/0085_denormalize_sourceimagecollection_counts.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| """ | ||
| Denormalize three counts onto ``SourceImageCollection`` so the list endpoint | ||
| reads them in O(1) instead of running 3 correlated count subqueries per row. | ||
|
|
||
| Backfill uses a single GROUP BY over the M2M with FILTER clauses to compute | ||
| all three counts in one pass. ``with_det`` checks for a valid (non-null / | ||
| non-empty) detection bbox to match the runtime | ||
| ``NULL_DETECTIONS_FILTER`` semantics in ``ami/main/models.py``. | ||
|
|
||
| ``atomic = False`` so the long UPDATE can run outside a single transaction | ||
| on production-sized M2M tables. | ||
| """ | ||
|
|
||
| from django.db import migrations, models | ||
|
|
||
|
|
||
| def backfill_counts(apps, schema_editor): | ||
| schema_editor.execute( | ||
| """ | ||
| UPDATE main_sourceimagecollection sc | ||
| SET source_images_count = c.total, | ||
| source_images_processed_count = c.processed, | ||
| source_images_with_detections_count = c.with_det | ||
| FROM ( | ||
| SELECT msci.sourceimagecollection_id AS coll_id, | ||
| COUNT(*) AS total, | ||
| COUNT(*) FILTER ( | ||
| WHERE EXISTS ( | ||
| SELECT 1 FROM main_detection d | ||
| WHERE d.source_image_id = si.id | ||
| ) | ||
| ) AS processed, | ||
| COUNT(*) FILTER ( | ||
| WHERE EXISTS ( | ||
| SELECT 1 FROM main_detection d | ||
| WHERE d.source_image_id = si.id | ||
| AND d.bbox IS NOT NULL | ||
| AND d.bbox::text <> '[]' | ||
| ) | ||
| ) AS with_det | ||
| FROM main_sourceimagecollection_images msci | ||
| INNER JOIN main_sourceimage si ON si.id = msci.sourceimage_id | ||
| GROUP BY msci.sourceimagecollection_id | ||
| ) c | ||
| WHERE sc.id = c.coll_id; | ||
| """ | ||
| ) | ||
| # Collections with no images: paginated SELECTs returned 0 via Coalesce; keep | ||
| # them populated rather than NULL so the column reads stay consistent. | ||
| schema_editor.execute( | ||
| """ | ||
| UPDATE main_sourceimagecollection | ||
| SET source_images_count = 0, | ||
| source_images_processed_count = 0, | ||
| source_images_with_detections_count = 0 | ||
| WHERE source_images_count IS NULL; | ||
| """ | ||
| ) | ||
|
|
||
|
|
||
| def reverse_noop(apps, schema_editor): | ||
| pass | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
| atomic = False | ||
|
|
||
| dependencies = [ | ||
| ("main", "0084_revoke_delete_job_from_roles"), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_count", | ||
| field=models.IntegerField(default=0), | ||
| ), | ||
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_with_detections_count", | ||
| field=models.IntegerField(default=0), | ||
| ), | ||
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_processed_count", | ||
| field=models.IntegerField(default=0), | ||
| ), | ||
|
Comment on lines
+24
to
+38
|
||
| migrations.RunPython(backfill_counts, reverse_noop), | ||
| ] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # Generated by Django 4.2.10 on 2026-05-13 20:05 | ||
|
|
||
| import ami.base.models | ||
| from django.db import migrations | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
| dependencies = [ | ||
| ("main", "0085_denormalize_sourceimagecollection_counts"), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="captures_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="detections_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="events_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="occurrences_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="taxa_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="event", | ||
| name="captures_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="event", | ||
| name="detections_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="event", | ||
| name="occurrences_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="sourceimage", | ||
| name="detections_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_count", | ||
| field=ami.base.models.CachedCountField(default=0), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_processed_count", | ||
| field=ami.base.models.CachedCountField(default=0), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_with_detections_count", | ||
| field=ami.base.models.CachedCountField(default=0), | ||
| ), | ||
| ] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohamedelabbas1996 I think we discussed a field type for cached counts last year. it finally happened!