-
Notifications
You must be signed in to change notification settings - Fork 14
Speed up the capture set list view #1301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mihow
wants to merge
10
commits into
main
Choose a base branch
from
perf/sourceimagecollection-cached-counts
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
83450a9
perf(api): denormalize SourceImageCollection counts as cached columns
mihow 457e090
refactor(models): introduce CachedCountField marker, apply to all den…
mihow 2725599
chore: drop tautological count-default test, clarify reactive comments
mihow 1a242aa
refactor(counts): async signals + cached_counts integrity check
mihow 451c8be
docs: cached-counts update_cached_counts method design
mihow 9ac2ce1
refactor(counts): unify cached-count refresh as BaseModel/BaseQuerySe…
mihow 1120396
fix(serializers): mark SourceImageCollection cached counts read-only
mihow 5113ff2
refactor(migrations): split SourceImageCollection counts schema and b…
mihow f0cc055
refactor(counts): defer reconcile tooling to follow-up
mihow 2533857
docs(planning): cached-counts reconcile follow-up + design space
mihow File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
89 changes: 89 additions & 0 deletions
89
ami/main/migrations/0085_denormalize_sourceimagecollection_counts.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| """ | ||
| Denormalize three counts onto ``SourceImageCollection`` so the list endpoint | ||
| reads them in O(1) instead of running 3 correlated count subqueries per row. | ||
|
|
||
| Backfill uses a single GROUP BY over the M2M with FILTER clauses to compute | ||
| all three counts in one pass. ``with_det`` checks for a valid (non-null / | ||
| non-empty) detection bbox to match the runtime | ||
| ``NULL_DETECTIONS_FILTER`` semantics in ``ami/main/models.py``. | ||
|
|
||
| ``atomic = False`` so the long UPDATE can run outside a single transaction | ||
| on production-sized M2M tables. | ||
| """ | ||
|
|
||
| from django.db import migrations, models | ||
|
|
||
|
|
||
| def backfill_counts(apps, schema_editor): | ||
| schema_editor.execute( | ||
| """ | ||
| UPDATE main_sourceimagecollection sc | ||
| SET source_images_count = c.total, | ||
| source_images_processed_count = c.processed, | ||
| source_images_with_detections_count = c.with_det | ||
| FROM ( | ||
| SELECT msci.sourceimagecollection_id AS coll_id, | ||
| COUNT(*) AS total, | ||
| COUNT(*) FILTER ( | ||
| WHERE EXISTS ( | ||
| SELECT 1 FROM main_detection d | ||
| WHERE d.source_image_id = si.id | ||
| ) | ||
| ) AS processed, | ||
| COUNT(*) FILTER ( | ||
| WHERE EXISTS ( | ||
| SELECT 1 FROM main_detection d | ||
| WHERE d.source_image_id = si.id | ||
| AND d.bbox IS NOT NULL | ||
| AND d.bbox::text <> '[]' | ||
| ) | ||
| ) AS with_det | ||
| FROM main_sourceimagecollection_images msci | ||
| INNER JOIN main_sourceimage si ON si.id = msci.sourceimage_id | ||
| GROUP BY msci.sourceimagecollection_id | ||
| ) c | ||
| WHERE sc.id = c.coll_id; | ||
| """ | ||
| ) | ||
| # Collections with no images: paginated SELECTs returned 0 via Coalesce; keep | ||
| # them populated rather than NULL so the column reads stay consistent. | ||
| schema_editor.execute( | ||
| """ | ||
| UPDATE main_sourceimagecollection | ||
| SET source_images_count = 0, | ||
| source_images_processed_count = 0, | ||
| source_images_with_detections_count = 0 | ||
| WHERE source_images_count IS NULL; | ||
| """ | ||
| ) | ||
|
|
||
|
|
||
| def reverse_noop(apps, schema_editor): | ||
| pass | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
| atomic = False | ||
|
|
||
| dependencies = [ | ||
| ("main", "0084_revoke_delete_job_from_roles"), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_count", | ||
| field=models.IntegerField(default=0), | ||
| ), | ||
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_with_detections_count", | ||
| field=models.IntegerField(default=0), | ||
| ), | ||
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_processed_count", | ||
| field=models.IntegerField(default=0), | ||
| ), | ||
|
Comment on lines
+24
to
+38
|
||
| migrations.RunPython(backfill_counts, reverse_noop), | ||
| ] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # Generated by Django 4.2.10 on 2026-05-13 20:05 | ||
|
|
||
| import ami.base.models | ||
| from django.db import migrations | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
| dependencies = [ | ||
| ("main", "0085_denormalize_sourceimagecollection_counts"), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="captures_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="detections_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="events_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="occurrences_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="deployment", | ||
| name="taxa_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="event", | ||
| name="captures_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="event", | ||
| name="detections_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="event", | ||
| name="occurrences_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="sourceimage", | ||
| name="detections_count", | ||
| field=ami.base.models.CachedCountField(blank=True, null=True), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_count", | ||
| field=ami.base.models.CachedCountField(default=0), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_processed_count", | ||
| field=ami.base.models.CachedCountField(default=0), | ||
| ), | ||
| migrations.AlterField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_with_detections_count", | ||
| field=ami.base.models.CachedCountField(default=0), | ||
| ), | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -32,7 +32,7 @@ | |
| import ami.tasks | ||
| import ami.utils | ||
| from ami.base.fields import DateStringField | ||
| from ami.base.models import BaseModel, BaseQuerySet | ||
| from ami.base.models import BaseModel, BaseQuerySet, CachedCountField | ||
| from ami.main import charts | ||
| from ami.main.models_future.filters import ( | ||
| build_occurrence_default_filters_q, | ||
|
|
@@ -754,11 +754,11 @@ class Deployment(BaseModel): | |
| # data_source_last_check_notes = models.TextField(max_length=255, blank=True, null=True) | ||
|
|
||
| # Pre-calculated values | ||
| events_count = models.IntegerField(blank=True, null=True) | ||
| occurrences_count = models.IntegerField(blank=True, null=True) | ||
| captures_count = models.IntegerField(blank=True, null=True) | ||
| detections_count = models.IntegerField(blank=True, null=True) | ||
| taxa_count = models.IntegerField(blank=True, null=True) | ||
| events_count = CachedCountField(blank=True, null=True) | ||
| occurrences_count = CachedCountField(blank=True, null=True) | ||
| captures_count = CachedCountField(blank=True, null=True) | ||
| detections_count = CachedCountField(blank=True, null=True) | ||
| taxa_count = CachedCountField(blank=True, null=True) | ||
| first_capture_timestamp = models.DateTimeField(blank=True, null=True) | ||
| last_capture_timestamp = models.DateTimeField(blank=True, null=True) | ||
|
|
||
|
|
@@ -1155,9 +1155,9 @@ class Event(BaseModel): | |
| occurrences: models.QuerySet["Occurrence"] | ||
|
|
||
| # Pre-calculated values | ||
| captures_count = models.IntegerField(blank=True, null=True) | ||
| detections_count = models.IntegerField(blank=True, null=True) | ||
| occurrences_count = models.IntegerField(blank=True, null=True) | ||
| captures_count = CachedCountField(blank=True, null=True) | ||
| detections_count = CachedCountField(blank=True, null=True) | ||
| occurrences_count = CachedCountField(blank=True, null=True) | ||
| calculated_fields_updated_at = models.DateTimeField(blank=True, null=True) | ||
|
|
||
| class Meta: | ||
|
|
@@ -1942,7 +1942,7 @@ class SourceImage(BaseModel): | |
| test_image = models.BooleanField(default=False) | ||
|
|
||
| # Precaclulated values | ||
| detections_count = models.IntegerField(null=True, blank=True) | ||
| detections_count = CachedCountField(null=True, blank=True) | ||
|
|
||
| project = models.ForeignKey(Project, on_delete=models.SET_NULL, null=True, related_name="captures") | ||
| deployment = models.ForeignKey(Deployment, on_delete=models.SET_NULL, null=True, related_name="captures") | ||
|
|
@@ -4093,32 +4093,6 @@ def html(self) -> str: | |
|
|
||
|
|
||
| class SourceImageCollectionQuerySet(BaseQuerySet): | ||
| def with_source_images_count(self): | ||
| return self.annotate( | ||
| source_images_count=models.Count( | ||
| "images", | ||
| distinct=True, | ||
| ) | ||
| ) | ||
|
|
||
| def with_source_images_with_detections_count(self): | ||
| return self.annotate( | ||
| source_images_with_detections_count=models.Count( | ||
| "images", | ||
| filter=(~models.Q(images__detections__bbox__isnull=True) & ~models.Q(images__detections__bbox=[])), | ||
| distinct=True, | ||
| ) | ||
| ) | ||
|
|
||
| def with_source_images_processed_count(self): | ||
| return self.annotate( | ||
| source_images_processed_count=models.Count( | ||
| "images", | ||
| filter=models.Q(images__detections__isnull=False), | ||
| distinct=True, | ||
| ) | ||
| ) | ||
|
|
||
| def with_source_images_processed_by_algorithm_count(self, algorithm_id: int): | ||
| return self.annotate( | ||
| source_images_processed_by_algorithm_count=models.Count( | ||
|
|
@@ -4205,6 +4179,12 @@ class SourceImageCollection(BaseModel): | |
| default=dict, | ||
| ) | ||
|
|
||
| # Denormalized counts. Kept in sync via m2m_changed and pipeline-completion | ||
| # hooks. Reads are O(1). | ||
| source_images_count = CachedCountField(default=0) | ||
| source_images_with_detections_count = CachedCountField(default=0) | ||
| source_images_processed_count = CachedCountField(default=0) | ||
|
Comment on lines
+4184
to
+4186
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great catch |
||
|
|
||
| objects = SourceImageCollectionManager() | ||
|
|
||
| jobs: models.QuerySet["Job"] | ||
|
|
@@ -4219,19 +4199,6 @@ def infer_dataset_type(self): | |
| def dataset_type(self): | ||
| return self.infer_dataset_type() | ||
|
|
||
| def source_images_count(self) -> int | None: | ||
| # This should always be pre-populated using queryset annotations | ||
| # return self.images.count() | ||
| return None | ||
|
|
||
| def source_images_with_detections_count(self) -> int | None: | ||
| # This should always be pre-populated using queryset annotations | ||
| return None | ||
|
|
||
| def source_images_processed_count(self) -> int | None: | ||
| # This should always be pre-populated using queryset annotations | ||
| return None | ||
|
|
||
| def occurrences_count(self) -> int | None: | ||
| # This should always be pre-populated using queryset annotations | ||
| return None | ||
|
|
@@ -4240,6 +4207,29 @@ def taxa_count(self) -> int | None: | |
| # This should always be pre-populated using queryset annotations | ||
| return None | ||
|
|
||
| def get_source_image_counts(self) -> dict[str, int]: | ||
| """Return the 3 source-image counts as a dict. Single aggregate query; does not write to the DB.""" | ||
| valid_det = Detection.objects.filter(source_image=models.OuterRef("pk")).exclude(NULL_DETECTIONS_FILTER) | ||
| any_det = Detection.objects.filter(source_image=models.OuterRef("pk")) | ||
| counts = self.images.annotate( | ||
| _has_any_det=Exists(any_det), | ||
| _has_valid_det=Exists(valid_det), | ||
| ).aggregate( | ||
| source_images_count=models.Count("id"), | ||
| source_images_processed_count=models.Count("id", filter=models.Q(_has_any_det=True)), | ||
| source_images_with_detections_count=models.Count("id", filter=models.Q(_has_valid_det=True)), | ||
| ) | ||
| return counts | ||
|
|
||
| def update_calculated_fields(self, save: bool = False) -> None: | ||
| """Recompute the 3 denormalized source-image count columns.""" | ||
| counts = self.get_source_image_counts() | ||
| self.source_images_count = counts["source_images_count"] | ||
| self.source_images_processed_count = counts["source_images_processed_count"] | ||
| self.source_images_with_detections_count = counts["source_images_with_detections_count"] | ||
| if save: | ||
| SourceImageCollection.objects.filter(pk=self.pk).update(**counts) | ||
|
|
||
| def get_queryset( | ||
| self, | ||
| *args, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohamedelabbas1996 I think we discussed a field type for cached counts last year. it finally happened!