Speed up the capture set list view#1301
Conversation
✅ Deploy Preview for antenna-preview canceled.
|
📝 WalkthroughWalkthroughThis PR introduces a denormalized cached-count field infrastructure: a new ChangesCached Count Denormalization and Async Recomputation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
af24585 to
39a8f02
Compare
|
|
||
| # Denormalized counts. Kept in sync via m2m_changed and pipeline-completion | ||
| # hooks. Reads are O(1). | ||
| source_images_count = models.IntegerField(default=0) |
There was a problem hiding this comment.
How can we indicate which fields are cached counts? field properties? prefix? I believe we have a cached_ prefix on other models, but then it makes the method replacements not 1:1. maybe thats okay. open to ideas.
There was a problem hiding this comment.
Claude says: Checked — no cached_ prefix exists on any model field in the codebase. All denormalized count fields use the bare <thing>_count pattern: Deployment.events_count, Deployment.captures_count, Event.captures_count, SourceImage.detections_count, etc. The new fields match this convention.
I'd lean against adding a cached_ prefix just to this PR's fields — it would create a split where some counts are prefixed and most aren't, and method names (occurrences_count, taxa_count on the same model) wouldn't be 1:1 anymore as you noted. If we want the prefix project-wide, that's a separate refactor PR covering all the count fields.
Happy to do either — flag if you'd prefer the new fields prefixed and we adopt project-wide later, or leave matching the existing convention.
There was a problem hiding this comment.
Is there a simple way to mark/tag/categorize/group model fields? To indicate these are cached counts that could be potentially stale or empty at any time? declare in a list on the model? add a model field property, etc?
39a8f02 to
401a613
Compare
|
Claude says: PR body updated. Two new pieces since the initial draft:
@coderabbitai full review please. Base branch is |
|
✅ Actions performedFull review triggered. |
| from ami.users.models import User | ||
|
|
||
|
|
||
| class CachedCountField(models.IntegerField): |
There was a problem hiding this comment.
@mohamedelabbas1996 I think we discussed a field type for cached counts last year. it finally happened!
Three planning notes from 3f34b78 were committed under docs/planning/ (wrong convention — should be docs/claude/planning/ per CLAUDE.md) and have since rotted: PR-A was superseded by the denormalize approach in #1301; the .only() recommendation was reverted per review; PR-B1 ships in this PR. Two genuinely useful items have been harvested: - New Relic baseline numbers for 9 list endpoints (post-NR-12.1.0) → memory entry reference_nr_baseline_2026_05_11.md - 22:22 UTC PG connection-pool burst hypothesis → issue #1302 - SourceImageViewSet.retrieve 571-call N+1 audit → issue #1303 Co-Authored-By: Claude <noreply@anthropic.com>
d12f477 to
f1019cd
Compare
* perf(api): trim captures list SELECT via .only()
Replaces the unconditional wide SELECT (every column on main_sourceimage +
main_deployment + main_event) on SourceImageViewSet.list with a `.only()`
list scoped to fields SourceImageListSerializer actually reads. Row width
drops from 52 columns to 26.
select_related("deployment__data_source") added so SourceImage.public_url()'s
presigned-URL fallback (when public_base_url is blank) does not fire a per-row
lazy fetch.
SOURCE_IMAGE_LIST_ONLY_FIELDS is exposed as a module-level constant with a
maintenance docstring covering how to keep it in sync as the serializer or
model methods change. SOURCE_IMAGE_PUBLIC_URL_DEPENDENCIES is split out as a
sub-constant so the 11 fields that exist only to support presigned-URL
generation are visibly grouped — that whole chain goes away once images are
served through the upcoming image-resizing/CDN layer instead of directly
from source buckets.
Only applied to the list action; retrieve keeps the wide SELECT so the
detail serializer's broader field set is unaffected.
Tests:
- TestSourceImageListQueryCount extended with response-shape assertion
guarding against `.only()` over-restriction (verifies url, size_display,
deployment.name, event present without lazy loads).
Refs: docs/planning/2026-05-11-list-endpoint-perf-continuation-plan.md PR-B1.
Co-Authored-By: Claude <noreply@anthropic.com>
* docs: list-endpoint perf planning trail
Adds:
- 2026-05-11-newrelic-post-upgrade-findings.md — NR prod data after the
agent upgrade that surfaced the list-endpoint hot paths.
- 2026-05-11-list-endpoint-perf-analysis.md — SQL-level root-cause
analysis for the 3 endpoints (collection list, captures list, project
charts).
- 2026-05-11-list-endpoint-perf-continuation-plan.md — sequenced PR plan.
PR-A (collection subquery rewrite) superseded by the denormalize-counts
approach on perf/sourceimagecollection-cached-counts. PR-B1 ships in
this PR. PR-C/D/E queued.
Co-Authored-By: Claude <noreply@anthropic.com>
* refactor(api): move SourceImage list-only field groups into viewset class
Per review on #1300: pulled `SOURCE_IMAGE_LIST_ONLY_FIELDS` /
`SOURCE_IMAGE_PUBLIC_URL_DEPENDENCIES` out of module scope and into
`SourceImageViewSet` as class attributes, grouped by what reads them
(core, deployment-nested, event-nested, public-url). Trimmed the
docstring and added a short note on why we whitelist with `.only()`
instead of blacklisting with `.defer()`, with a link to the Django ref.
Co-Authored-By: Claude <noreply@anthropic.com>
* refactor(api): drop .only() — select_related is the N+1 fix
Per @mihow review on #1300: the `.only()` machinery added in this PR was
acting as a row-width trim on top of select_related, not as the N+1 fix.
The actual N+1 prevention is `select_related("deployment__data_source")`
ensuring `SourceImage.public_url()` doesn't lazy-load the data_source
chain per row. With that in place, dropping `.only()` keeps the captures
list query count flat (verified empirically — `TestSourceImageListQueryCount`
still passes at limit=1, 5, 25).
This removes the maintenance burden of keeping the field tuples in sync
with the serializer surface. django-zen-queries and the existing perf
tests are the safety net for regressions.
Renamed `test_list_response_shape_preserved_after_only` to
`test_list_response_shape_has_no_lazy_loads` so the contract it asserts
is independent of any specific deferred-fields mechanism.
Co-Authored-By: Claude <noreply@anthropic.com>
* docs: drop list-endpoint perf planning docs
Three planning notes from 3f34b78 were committed under docs/planning/
(wrong convention — should be docs/claude/planning/ per CLAUDE.md) and
have since rotted: PR-A was superseded by the denormalize approach in
#1301; the .only() recommendation was reverted per review; PR-B1 ships
in this PR. Two genuinely useful items have been harvested:
- New Relic baseline numbers for 9 list endpoints (post-NR-12.1.0)
→ memory entry reference_nr_baseline_2026_05_11.md
- 22:22 UTC PG connection-pool burst hypothesis
→ issue #1302
- SourceImageViewSet.retrieve 571-call N+1 audit
→ issue #1303
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
The collection list endpoint previously ran 3 correlated count subqueries per row (source_images_count, source_images_with_detections_count, source_images_processed_count). On large collections each subquery scans the M2M and Detection tables. Reads are now O(1) against cached columns. - Add 3 IntegerField columns on SourceImageCollection with default=0 - Migration 0085 backfills via a single GROUP BY over the M2M with FILTER clauses; atomic=False so prod-sized M2M can chunk - update_calculated_fields() recomputes all 3 counts in one aggregate query - m2m_changed on images.through keeps counts fresh on add/remove (both directions; reverse post_clear documented as caller responsibility) - Detection post_save/post_delete recompute for collections containing the affected source image - pipeline.save_results() recomputes for affected collections after the bulk_create batch (which skips signals) - Drop the 3 with_*_count() annotation methods on SourceImageCollectionQuerySet and the chain in SourceImageCollectionViewSet.queryset - TestSourceImageCollectionCountsDenormalize covers initial state, add/remove, Detection create/delete, null-bbox processed-but-not-with-detections case, and update_calculated_fields() recovery from drift - TestSourceImageCollectionListQueryCount tightened: with_counts now <=10q (was <=15), ordering by source_images_count uses column directly Follow-up to PR #1300 — the paginator COUNT(*) win there is mostly invisible since the UI does not paginate collections; this is the actual UX-visible improvement. Co-Authored-By: Claude <noreply@anthropic.com>
…ormalized count columns Add `CachedCountField(IntegerField)` to `ami/base/models.py` as a marker subclass for cached aggregate columns. Swap it in for the 12 existing denormalized count fields across Deployment, Event, SourceImage, and SourceImageCollection. Discoverable via `Model._meta.get_fields()` + `isinstance(f, CachedCountField)` for future use (refresh tasks, admin display, list-endpoint defer()). The DB column type is unchanged (IntegerField -> IntegerField), so the generated migration `0086_use_cached_count_field` is a pure no-op at the SQL level; it only updates Django's recorded model state. Co-Authored-By: Claude <noreply@anthropic.com>
- Remove `test_initial_counts_zero` from `TestSourceImageCollectionCountsDenormalize`; it asserted the `default=0` field-level default rather than any code path we own. - Rewrite a handful of em-dash fragment comments added during PR review rounds (`get_source_image_counts` docstring, captures-list query-count assertion message, collection-list ordering comment, denormalize-test side-effect note) in normal prose. Co-Authored-By: Claude <noreply@anthropic.com>
f1019cd to
2725599
Compare
✅ Deploy Preview for antenna-ssec canceled.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/signals.py`:
- Around line 210-211: The signal currently calls
instance.update_calculated_fields(save=True) synchronously for actions
("post_add", "post_remove", "post_clear"), which can persist stale counts under
concurrent transactions; change these immediate calls to run after the DB
transaction commits by wrapping the call in django.db.transaction.on_commit
(e.g., transaction.on_commit(lambda:
instance.update_calculated_fields(save=True))). Apply the same replacement for
the other occurrence around lines referencing the same actions so recomputation
is deferred until successful commit.
In `@ami/main/tests.py`:
- Line 3386: Test fixtures are using normalized [0–1] bbox values but this repo
expects pixel-space coordinates; update the Detection/BoundingBox fixtures
(e.g., the Detection instances setting bbox=[0.1,0.1,0.2,0.2] and the other
occurrences at the mentioned sites) to use absolute pixel coordinates consistent
with the image dimensions used in these tests. Locate the test rows creating
Detection objects (look for Detection(..., bbox=...) and any direct
BoundingBox(...) usages) and replace the normalized floats with the
corresponding pixel values (or explicit integers) so the fixtures represent true
pixel-space bboxes across the cases mentioned (lines ~3386, ~3398, ~3424,
~3443).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1e0f8a9e-539a-4e36-8acc-5da8d6ba207a
📒 Files selected for processing (8)
ami/base/models.pyami/main/api/views.pyami/main/migrations/0085_denormalize_sourceimagecollection_counts.pyami/main/migrations/0086_use_cached_count_field.pyami/main/models.pyami/main/signals.pyami/main/tests.pyami/ml/models/pipeline.py
Address opus takeaway-review findings on the SourceImageCollection denormalize work. Signal handlers no longer block on the recompute: - Detection post_save/post_delete dedupes affected source_image_ids on a per-connection set and fans out one task per affected collection via transaction.on_commit. A 10k-detection bulk_create-loop (where signals fire) now triggers at most one task per affected collection instead of 10k synchronous aggregates per detection. - m2m_changed on SourceImageCollection.images defers to a Celery task via transaction.on_commit. New ami.main.checks.cached_counts module catches drift from bulk write paths that skip signals (bulk_create, bulk_update, raw SQL, ML post- processors like small_size_filter). Generic across every model that declares a CachedCountField column — discovered via Model._meta.get_fields(). Layered defense: signals = best-effort sync, periodic reconcile_cached_counts_task = safety net. This also gives CachedCountField its first consumer, addressing the "marker class with no consumer" critique. Other tweaks from the same review pass: - CachedCountField docstring documents migration-friction tradeoff (future AlterField for unrelated IntegerFields near cached ones). - SourceImageCollection.update_calculated_fields docstring explains the .filter(pk=).update() vs .save() divergence on the other 3 models. - TestSourceImageCollectionCountsDenormalize → TransactionTestCase + eager Celery so on_commit hooks actually fire in tests. - TestCachedCountsIntegrityCheck (5 tests) covers discover / find / reconcile dry-run / reconcile repair / no-drift baseline. - Detection bbox fixtures: [0.1...] → [10, 10, 20, 20] for consistency with codebase pixel-space convention (CodeRabbit nit). Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR speeds up capture-set list queries by replacing per-row source-image count annotations with denormalized cached columns and adding infrastructure to discover/reconcile cached count drift.
Changes:
- Adds cached count fields and migrations for
SourceImageCollection, plusCachedCountFieldmarker usage. - Updates list/query logic, signals, tasks, and pipeline bulk-save handling to refresh cached collection counts.
- Adds integrity-check helpers and regression/query-count tests.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
ami/base/models.py |
Adds CachedCountField marker class. |
ami/main/models.py |
Converts cached count fields and adds SourceImageCollection count recomputation. |
ami/main/api/views.py |
Removes source-image count annotations from collection queryset. |
ami/main/signals.py |
Adds M2M/detection signal handlers for collection count refresh. |
ami/main/tasks.py |
Adds collection refresh and cached-count reconciliation Celery tasks. |
ami/main/checks/cached_counts.py |
Adds cached-count discovery, drift detection, and reconciliation helpers. |
ami/ml/models/pipeline.py |
Refreshes affected collection counts after bulk-created detections. |
ami/main/tests.py |
Adds denormalized count, integrity-check, and list query-count tests. |
ami/main/migrations/0085_denormalize_sourceimagecollection_counts.py |
Adds/backfills SourceImageCollection cached count columns. |
ami/main/migrations/0086_use_cached_count_field.py |
Updates migration state to use CachedCountField. |
Comments suppressed due to low confidence (3)
ami/main/signals.py:268
- This only refreshes collections for the detection's current
source_image_id. BecauseDetection.source_imageis writable through the serializer/API, moving a detection to another capture leaves collections containing the old source image with stale processed/with-detections counts; the old source image id needs to be captured before save and refreshed too when it changes.
@receiver(post_save, sender=Detection)
@receiver(post_delete, sender=Detection)
def update_collection_counts_on_detection_change(sender, instance, **kwargs):
"""Schedule a collection-counts refresh for every collection containing the affected SourceImage.
The dedup + on_commit indirection means even tight per-row Detection write
loops fan out to at most one task per affected collection. ``bulk_create``
/ ``bulk_update`` skip signals entirely — those rely on the periodic
reconciliation task to repair drift.
"""
if not instance.source_image_id:
return
_schedule_collection_refresh_for_source_image(instance.source_image_id)
ami/main/signals.py:253
- This does not cover
SourceImagedeletion: Django removes auto-created M2M through rows during cascade delete without sendingm2m_changed, and the detectionpost_deletepath can no longer discover the collections once the image membership is gone. Deleting captures will therefore leavesource_images_count(and derived processed/detection counts) stale unless affected collection ids are captured before the image is deleted.
@receiver(m2m_changed, sender=SourceImageCollection.images.through)
def update_collection_counts_on_m2m(sender, instance, action, **kwargs):
"""Recompute denormalized counts when images are added to or removed from a collection."""
if action in ("post_add", "post_remove", "post_clear"):
collection_pk = instance.pk
transaction.on_commit(lambda: refresh_collection_cached_counts.delay(collection_pk))
ami/main/signals.py:253
- The handler assumes
instanceis always aSourceImageCollection, butm2m_changedalso fires for the reverse manager (source_image.collections.add/remove(...)) withinstanceas aSourceImageand the collection ids inpk_set. In that path this schedules a refresh for the source-image id instead of the affected collection ids, leaving the real collection counts stale.
@receiver(m2m_changed, sender=SourceImageCollection.images.through)
def update_collection_counts_on_m2m(sender, instance, action, **kwargs):
"""Recompute denormalized counts when images are added to or removed from a collection."""
if action in ("post_add", "post_remove", "post_clear"):
collection_pk = instance.pk
transaction.on_commit(lambda: refresh_collection_cached_counts.delay(collection_pk))
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pending = getattr(connection, _PENDING_SOURCE_IMAGE_IDS_ATTR, None) | ||
| is_new = pending is None | ||
| if is_new: | ||
| pending = set() | ||
| setattr(connection, _PENDING_SOURCE_IMAGE_IDS_ATTR, pending) | ||
| pending.add(source_image_id) | ||
| if is_new: | ||
| # Outside an atomic block, ``on_commit`` fires synchronously at | ||
| # registration time — so the ``add`` above must precede it or the | ||
| # flush sees an empty set. | ||
| transaction.on_commit(_flush_pending_collection_refreshes) |
| for instance, stored, computed in find_stale_cached_counts(m, project_id=project_id): | ||
| result.checked += 1 |
| source_images_count = CachedCountField(default=0) | ||
| source_images_with_detections_count = CachedCountField(default=0) | ||
| source_images_processed_count = CachedCountField(default=0) |
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_count", | ||
| field=models.IntegerField(default=0), | ||
| ), | ||
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_with_detections_count", | ||
| field=models.IntegerField(default=0), | ||
| ), | ||
| migrations.AddField( | ||
| model_name="sourceimagecollection", | ||
| name="source_images_processed_count", | ||
| field=models.IntegerField(default=0), | ||
| ), |
| # bulk_create above skips Detection signals; refresh affected collections explicitly. | ||
| source_image_ids = [img.pk for img in source_images] | ||
| for collection in SourceImageCollection.objects.filter(images__id__in=source_image_ids).distinct(): | ||
| collection.update_calculated_fields(save=True) |
| # occurrences/taxa subquery annotations. | ||
| self.assertLessEqual(count, 10, f"Collection list with_counts too many queries: {count}") | ||
|
|
||
| def test_list_query_count_ordering_by_annotated_count(self): |
| def test_list_query_count_does_not_scale_with_page_size(self): | ||
| small = self._list_query_count(f"/api/v2/captures/collections/?project_id={self.project.pk}&limit=1") | ||
| large = self._list_query_count(f"/api/v2/captures/collections/?project_id={self.project.pk}&limit=25") | ||
| print(f"\n[AUDIT] Collection list: limit=1 -> {small}q, limit=25 -> {large}q") | ||
| self.assertLessEqual(large, small + 2, f"Collection list scaling: {small} -> {large} (likely N+1)") |
|
Claude says: Self-suggestion for follow-up PR (or fold into this one): simplify the cached-counts plumbing introduced here. Current shape works but has three per-model pieces (dedup state attr + flush helper + per-model Celery task). Next cached-count addition repeats all three. Proposed consolidationOne free helper + queryset/instance methods on # ami/base/cached_counts.py
_PENDING_ATTR = "_pending_cached_count_recomputes"
def _schedule_recompute(model, pk):
"""Per-connection dedup + transaction.on_commit dispatch.
Outside an atomic block, on_commit fires synchronously at registration
time, so we set the pending set BEFORE registering the flush callback.
"""
pending = getattr(connection, _PENDING_ATTR, None)
is_new = pending is None
if is_new:
pending = set()
setattr(connection, _PENDING_ATTR, pending)
pending.add((model._meta.label, pk))
if is_new:
transaction.on_commit(_flush)
def _flush():
pending = getattr(connection, _PENDING_ATTR, set())
try: delattr(connection, _PENDING_ATTR)
except AttributeError: pass
for label, pk in pending:
recompute_cached_counts_task.delay(label, pk)
@shared_task(ignore_result=True)
def recompute_cached_counts_task(model_label, pk):
model = apps.get_model(model_label)
try: instance = model.objects.get(pk=pk)
except model.DoesNotExist: return
instance.update_calculated_fields(save=True)# ami/base/models.py
class BaseModel(models.Model):
def update_cached_counts(self, run_async: bool = True) -> None:
if run_async:
_schedule_recompute(type(self), self.pk)
return
self.update_calculated_fields(save=True)
class BaseQuerySet(QuerySet):
def update_cached_counts(self, run_async: bool = True) -> None:
for pk in self.values_list("pk", flat=True):
if run_async:
_schedule_recompute(self.model, pk)
else:
self.model.objects.get(pk=pk).update_calculated_fields(save=True)Call sites collapsePipeline SourceImageCollection.objects.filter(
images__id__in=source_image_ids
).update_cached_counts(run_async=False)Signals: @receiver(post_save, sender=Detection)
@receiver(post_delete, sender=Detection)
def update_collection_counts_on_detection_change(sender, instance, **kwargs):
if not instance.source_image_id:
return
SourceImageCollection.objects.filter(
images__id=instance.source_image_id
).update_cached_counts() # async by defaultReconcile loop in What goes away
What stays
Next cached-count additionThree things instead of six:
Reconcile + drift detection are free. What I'd NOT do
Happy to land as a follow-up PR (cleaner review scope) or fold into this branch if reviewers prefer the unified pattern in one place. |
Captures the design behind replacing PR #1301's per-source-table dedup state + per-model Celery refresh tasks with a generic instance/queryset method on BaseModel/BaseQuerySet. Wraps update_calculated_fields(save=True) with caller-controlled sync vs async (run_async=True default) and per- (model, pk) dedup keyed on the DB connection. Co-Authored-By: Claude <noreply@anthropic.com>
…t method Replace the per-source-table dedup state + per-model Celery refresh tasks introduced in 1a242aa with a generic `update_cached_counts(run_async=True)` method on `BaseModel` and `BaseQuerySet`. Wraps the existing `update_calculated_fields(save=True)` body; per-(model_label, pk) dedup keyed on the active DB connection collapses high-volume signal fan-out to one task per affected row. Architecture: - `ami/base/cached_counts.py` — `schedule_recompute(label, pk)` queues a recompute in a per-connection dedup set; `_flush_pending_recomputes` is registered via `transaction.on_commit` (once per call, idempotent on the drain side so it survives rollback-then-fresh-transaction in tests). - `BaseModel.update_cached_counts(run_async=True)` — schedules a recompute for `self`. With `run_async=False` runs inline. - `BaseQuerySet.update_cached_counts(run_async=True)` — schedules one per row in the queryset. - `ami/main/tasks.py::recompute_cached_counts_task` — generic Celery task, loads model via `apps.get_model(label)` and calls `update_calculated_fields(save=True)`. Call site changes: - Detection signal handler is now one line: `SourceImageCollection.objects.filter(images__id=instance.source_image_id).update_cached_counts()` - Collection m2m handler: `instance.update_cached_counts()`. - `refresh_project_cached_counts` body keeps the existing `project.update_related_calculated_fields()` (preserves the bulk subquery UPDATE for `SourceImage.detections_count`); body no longer duplicates the Events/Deployments loop here. - Reconcile loop in `ami/main/checks/cached_counts.py` calls `instance.update_cached_counts(run_async=False)` for stylistic unity. Removed: - `_PENDING_SOURCE_IMAGE_IDS_ATTR`, `_flush_pending_collection_refreshes`, `_schedule_collection_refresh_for_source_image` (signals.py) - `refresh_collection_cached_counts` task (tasks.py) Cost of adding the next cached-count column drops from ~50 LOC (new field + recompute body + dedup attr + flush helper + Celery task + signal handler) to ~10 LOC (new field + recompute body + signal handler calling the method). Design rationale and trade-offs in `docs/superpowers/specs/2026-05-14-cached-counts-update-method-design.md`. Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude says: Landed in 9ac2ce17. Test isolation note worth flagging: the per-connection dedup set survives Reconcile loop in Design doc + trade-offs in |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ami/base/models.py (1)
74-80: ⚡ Quick winOptimize the sync path to avoid N+1 queries.
The sync path (lines 79-80) loads instances one at a time with
self.model.objects.get(pk=pk), creating N+1 queries. Since the async path only needs PKs but the sync path needs full instances, branch onrun_asyncand useself.iterator()for the sync path to load instances efficiently in a single query.⚡ Proposed refactor
- model_label = self.model._meta.label - for pk in self.values_list("pk", flat=True): - if run_async: + if run_async: + model_label = self.model._meta.label + for pk in self.values_list("pk", flat=True): schedule_recompute(model_label, pk) - else: - instance = self.model.objects.get(pk=pk) + else: + for instance in self.iterator(): instance.update_calculated_fields(save=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/base/models.py` around lines 74 - 80, The current sync path does N+1 queries by calling self.model.objects.get(pk=pk) for each pk; change the logic to branch on run_async: if run_async keep the existing loop over self.values_list("pk", flat=True) and call schedule_recompute(model_label, pk), but for the sync path iterate the queryset itself using self.iterator() (or another queryset iterator) to load instances efficiently and call instance.update_calculated_fields(save=True) on each instance; reference model_label, schedule_recompute, run_async, self.values_list("pk", flat=True), self.iterator(), and update_calculated_fields when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ami/main/tasks.py`:
- Line 11: The task function recompute_cached_counts_task currently types pk as
int but should accept any primary key type to match schedule_recompute in
ami/base/cached_counts.py; change the signature to use pk: Any and add "from
typing import Any" at the top of ami/main/tasks.py, then pass pk through to
schedule_recompute unchanged so type checkers no longer flag the mismatch (refer
to recompute_cached_counts_task and schedule_recompute).
---
Nitpick comments:
In `@ami/base/models.py`:
- Around line 74-80: The current sync path does N+1 queries by calling
self.model.objects.get(pk=pk) for each pk; change the logic to branch on
run_async: if run_async keep the existing loop over self.values_list("pk",
flat=True) and call schedule_recompute(model_label, pk), but for the sync path
iterate the queryset itself using self.iterator() (or another queryset iterator)
to load instances efficiently and call
instance.update_calculated_fields(save=True) on each instance; reference
model_label, schedule_recompute, run_async, self.values_list("pk", flat=True),
self.iterator(), and update_calculated_fields when applying the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e47d60b5-368e-40bd-a816-14549cda47b0
📒 Files selected for processing (8)
ami/base/cached_counts.pyami/base/models.pyami/main/checks/cached_counts.pyami/main/models.pyami/main/signals.pyami/main/tasks.pyami/main/tests.pydocs/superpowers/specs/2026-05-14-cached-counts-update-method-design.md
✅ Files skipped from review due to trivial changes (1)
- docs/superpowers/specs/2026-05-14-cached-counts-update-method-design.md
🚧 Files skipped from review as they are similar to previous changes (2)
- ami/main/signals.py
- ami/main/models.py
|
|
||
|
|
||
| @celery_app.task(ignore_result=True) | ||
| def recompute_cached_counts_task(model_label: str, pk: int) -> None: |
There was a problem hiding this comment.
Widen the pk type annotation to support non-integer primary keys.
The signature declares pk: int, but schedule_recompute in ami/base/cached_counts.py accepts pk: Any (line 27), and Django models can use UUIDField or other types as primary keys. While Celery serialization handles this at runtime, type checkers will flag the mismatch.
🔧 Proposed fix
-def recompute_cached_counts_task(model_label: str, pk: int) -> None:
+def recompute_cached_counts_task(model_label: str, pk: Any) -> None:Add the import at the top if not already present:
+from typing import Any📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def recompute_cached_counts_task(model_label: str, pk: int) -> None: | |
| from typing import Any | |
| def recompute_cached_counts_task(model_label: str, pk: Any) -> None: |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ami/main/tasks.py` at line 11, The task function recompute_cached_counts_task
currently types pk as int but should accept any primary key type to match
schedule_recompute in ami/base/cached_counts.py; change the signature to use pk:
Any and add "from typing import Any" at the top of ami/main/tasks.py, then pass
pk through to schedule_recompute unchanged so type checkers no longer flag the
mismatch (refer to recompute_cached_counts_task and schedule_recompute).
The three denormalized count columns (source_images_count, source_images_with_detections_count, source_images_processed_count) were exposed as writable model fields after the schema migration. A client with update permission could PATCH arbitrary values and persist drift until the next signal recompute touched the row. Co-Authored-By: Claude <noreply@anthropic.com>
…ackfill 0085 was non-atomic and combined irreversible AddField operations with a long RunPython backfill. If the backfill failed or was interrupted on production-sized data, the columns existed but the migration stayed unrecorded -- a retry would fail on duplicate column. Split into: - 0085: AddField x3 only, atomic. AddField with constant default is a metadata-only operation on PostgreSQL 11+, safe on large tables. - 0086_backfill: the GROUP BY / FILTER backfill, atomic=False. Writes absolute computed values, so naturally idempotent and safe to re-run if interrupted. - 0087_use_cached_count_field: renamed from 0086, dependency updated to point at the new backfill migration. Co-Authored-By: Claude <noreply@anthropic.com>
The reconcile tooling shipped with this branch was not production-wired: no Celery beat schedule registered the periodic task, the IntegrityCheck contract was violated (checked counted only drifted rows, not rows inspected), and the task default of (project_id=None, dry_run=False) was a full-table repair sweep across every CachedCountField model. Better to ship the perf win on its own and bring the reconcile/integrity layer in a follow-up that registers the periodic task, fixes the checked semantics, defaults to dry_run, and surfaces results on a dashboard. Removed: - ami/main/checks/cached_counts.py (discover/find_stale/reconcile_) - reconcile_cached_counts_task in ami/main/tasks.py - TestCachedCountsIntegrityCheck in ami/main/tests.py Adjusted: - ami/main/signals.py: safety-net comment now describes the actual current coverage (signals + pipeline.save_results) and points at the follow-up planning doc instead of the removed reconcile task. - ami/base/models.py: CachedCountField docstring no longer references the deleted module. Co-Authored-By: Claude <noreply@anthropic.com>
Captures what was removed from the reconcile layer and why (no beat registration, checked-counts semantics, unsafe defaults), what the follow-up needs to ship, and a design-space section for future expensive cached fields. Names the patterns under consideration (NULL-sentinel staleness, freshness-timestamp companion, aggregates/rollup table / materialized view) and proposes a marker-class hierarchy so a single discovery sweep can dispatch per-strategy reconcile (eager: set-based diff; lazy: freshness check; rollup-backed: refresh view). Cites the existing freshness-timestamp precedent on Event.calculated_fields_updated_at. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
ami/main/tests.py (3)
3444-3470: ⚡ Quick winAssert all cached columns stay untouched in the no-write test.
This currently proves
get_source_image_counts()leavessource_images_countalone, but it would still pass if the method accidentally persistedsource_images_processed_countorsource_images_with_detections_count.Suggested assertion expansion
# Confirm the DB row was not updated. self.collection.refresh_from_db() self.assertEqual(self.collection.source_images_count, 0) + self.assertEqual(self.collection.source_images_processed_count, 0) + self.assertEqual(self.collection.source_images_with_detections_count, 0)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/tests.py` around lines 3444 - 3470, The test currently only asserts that source_images_count is unchanged after calling get_source_image_counts(); extend it to also assert that the other cached DB columns (source_images_processed_count and source_images_with_detections_count) remain untouched in the no-write case: after calling self.collection.get_source_image_counts() and refreshing from DB via self.collection.refresh_from_db(), add assertions that self.collection.source_images_processed_count == 0 and self.collection.source_images_with_detections_count == 0 so the test fails if get_source_image_counts() erroneously persists those fields on SourceImageCollection.
3376-3385: ⚡ Quick winAdd a
clear()regression case.This suite exercises add/remove, but the PR also relies on the
images.clear()maintenance path. A small test that clears the collection and expects all three cached count columns to drop back to zero would close that last signal path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/tests.py` around lines 3376 - 3385, Add a new test method (e.g., test_count_resets_on_image_clear) that sets up the collection with self.collection.images.set(self.images), calls self.collection.images.clear(), refreshes state with self._refresh(), and then asserts that the three cached count columns on the collection (including source_images_count) have dropped to zero; reference collection.images.clear() for the action and collection.source_images_count (and the other two cached count attributes on the collection) for the assertions.
3295-3305: ⚡ Quick winMake the ordering path observable.
Every seeded collection gets the same image set here, so
ordering=-source_images_countcan be ignored and this test still stays green. Seed a few distinct cached counts and assert the returned rows are actually sorted before checking query count.Also applies to: 3342-3349
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ami/main/tests.py` around lines 3295 - 3305, The test seeds 30 collections all with the exact same image set so the ordering-by -source_images_count is invisible; change the seeding in the SourceImageCollection creation (in the block that creates "qcount-collection-{i}" and the similar block at 3342-3349) to produce several distinct cached counts (e.g., create collections with varying numbers of images or explicitly set the cached count field on SourceImageCollection after creation), then query the view and assert that the returned rows are actually sorted by source_images_count (check the sequence of returned collection IDs or counts) before performing the query-count assertion; update the creation loop for SourceImageCollection and add an assertion verifying ordering so the test will fail if ordering is ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/claude/planning/cached-counts-reconcile-followup.md`:
- Line 8: The PR reference at the start of the line " `#1301` ships the perf win
(denormalized columns, signals, ML-path recompute," is being parsed as a
Markdown heading; update that line to escape the leading hash or wrap the PR
number in inline code so it renders as plain text (e.g., replace "`#1301`" with
"\`#1301`" or "`#1301`") ensuring the rest of the sentence remains unchanged and
the file docs/claude/planning/cached-counts-reconcile-followup.md still conveys
the same message.
---
Nitpick comments:
In `@ami/main/tests.py`:
- Around line 3444-3470: The test currently only asserts that
source_images_count is unchanged after calling get_source_image_counts(); extend
it to also assert that the other cached DB columns
(source_images_processed_count and source_images_with_detections_count) remain
untouched in the no-write case: after calling
self.collection.get_source_image_counts() and refreshing from DB via
self.collection.refresh_from_db(), add assertions that
self.collection.source_images_processed_count == 0 and
self.collection.source_images_with_detections_count == 0 so the test fails if
get_source_image_counts() erroneously persists those fields on
SourceImageCollection.
- Around line 3376-3385: Add a new test method (e.g.,
test_count_resets_on_image_clear) that sets up the collection with
self.collection.images.set(self.images), calls self.collection.images.clear(),
refreshes state with self._refresh(), and then asserts that the three cached
count columns on the collection (including source_images_count) have dropped to
zero; reference collection.images.clear() for the action and
collection.source_images_count (and the other two cached count attributes on the
collection) for the assertions.
- Around line 3295-3305: The test seeds 30 collections all with the exact same
image set so the ordering-by -source_images_count is invisible; change the
seeding in the SourceImageCollection creation (in the block that creates
"qcount-collection-{i}" and the similar block at 3342-3349) to produce several
distinct cached counts (e.g., create collections with varying numbers of images
or explicitly set the cached count field on SourceImageCollection after
creation), then query the view and assert that the returned rows are actually
sorted by source_images_count (check the sequence of returned collection IDs or
counts) before performing the query-count assertion; update the creation loop
for SourceImageCollection and add an assertion verifying ordering so the test
will fail if ordering is ignored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c4041159-0f7e-4adf-86e1-2c798d80337d
📒 Files selected for processing (9)
ami/base/models.pyami/main/api/serializers.pyami/main/migrations/0085_denormalize_sourceimagecollection_counts.pyami/main/migrations/0086_backfill_sourceimagecollection_counts.pyami/main/migrations/0087_use_cached_count_field.pyami/main/signals.pyami/main/tasks.pyami/main/tests.pydocs/claude/planning/cached-counts-reconcile-followup.md
🚧 Files skipped from review as they are similar to previous changes (1)
- ami/base/models.py
|
|
||
| ## Why this is a separate PR | ||
|
|
||
| #1301 ships the perf win (denormalized columns, signals, ML-path recompute, |
There was a problem hiding this comment.
Escape PR reference to avoid accidental/malformed heading rendering.
Line 8 starts with #1301, which markdown parsers/lint treat as an ATX heading token pattern. Use an escaped hash or inline code so this stays a plain PR reference.
Suggested fix
-#1301 ships the perf win (denormalized columns, signals, ML-path recompute,
+\`#1301` ships the perf win (denormalized columns, signals, ML-path recompute,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #1301 ships the perf win (denormalized columns, signals, ML-path recompute, | |
| \`#1301` ships the perf win (denormalized columns, signals, ML-path recompute, |
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 8-8: No space after hash on atx style heading
(MD018, no-missing-space-atx)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/claude/planning/cached-counts-reconcile-followup.md` at line 8, The PR
reference at the start of the line " `#1301` ships the perf win (denormalized
columns, signals, ML-path recompute," is being parsed as a Markdown heading;
update that line to escape the leading hash or wrap the PR number in inline code
so it renders as plain text (e.g., replace "`#1301`" with "\`#1301`" or "`#1301`")
ensuring the rest of the sentence remains unchanged and the file
docs/claude/planning/cached-counts-reconcile-followup.md still conveys the same
message.
Summary
Two changes, both targeting
SourceImageCollectionViewSet.listquery cost on large M2M tables:SourceImageCollection. Replaces 3 per-rowSubquery(Count(...))annotations with a plain column SELECT; O(1) read regardless of M2M size.CachedCountFieldinami/base/models.py— a marker subclass ofIntegerField— and apply it to all 12 existing cached count columns project-wide. Discoverable viaModel._meta.get_fields()+isinstance(f, CachedCountField)so future cross-cutting tasks (admin display, refresh jobs, list-endpoint defer) can enumerate them.Stacked on #1300 (
select_relatedon captures list). Base branch isperf/list-endpoint-subquery-rewrite; will retarget tomainonce #1300 merges.Denormalization details
SourceImageCollection:source_images_count,source_images_with_detections_count,source_images_processed_count. Schema migration0085_denormalize_sourceimagecollection_countscreates them and backfills via a single GROUP BY +FILTERscan of the M2M (computes all three counts in one pass).atomic=Falseso the UPDATE can chunk on prod-sized data.SourceImageCollection.get_source_image_counts()returns the 3 counts as a dict from a single aggregate query (no writes — pure compute, testable in isolation).update_calculated_fields(save=True)is the side-effecting wrapper that also writes the columns.ami/main/signals.py:m2m_changedonSourceImageCollection.images.throughforpost_add,post_remove,post_clear.Detectionpost_save/post_deleterecompute for any collection containing the affectedsource_image.pipeline.save_results()explicitly recomputes for affected collections after itsbulk_createbatch, sincebulk_createskips signals.with_*_count()annotation methods onSourceImageCollectionQuerySetand removes them from the viewset chain.CachedCountField details
ami/base/models.py—class CachedCountField(models.IntegerField). Pure marker; no field-level behavior change.Deployment.{events,occurrences,captures,detections,taxa}_count,Event.{captures,detections,occurrences}_count,SourceImage.detections_count,SourceImageCollection.{source_images,source_images_with_detections,source_images_processed}_count.0086_use_cached_count_fieldis recorded but the generated SQL is(no-op)for everyAlterField— the column type is unchanged. The migration only updates Django's recorded model state so future operations see the right field class.Why denormalize rather than the subquery rewrite from the earlier draft of #1300
The collections list UI does not paginate. The paginator
COUNT(*)win from rewriting the annotations as correlated subqueries (the original #1300 approach) is invisible to UX. The remaining cost was 3 per-row count subqueries against the M2M +main_detection; on a collection holding 500k+ images, each request still spent hundreds of ms per subquery.Bench data from the earlier draft also showed the subquery rewrite was slower than the original
Count(images, filter=..., distinct=True)on small projects and roughly equal on the heaviest project. Cached columns are an unambiguous win across all project sizes.Precedent:
Deploymentalready had 5 cached counts (events_count,occurrences_count,captures_count,detections_count,taxa_count). TheCachedCountFieldmarker formalizes that pattern.Recompute granularity tradeoff
Per-Detection signal recompute fires on every individual
Detection.save()/Detection.delete(). For ML pipeline writes this is a non-issue:Detection.objects.bulk_create()skips signals, andpipeline.save_results()batches one recompute per job. The per-row signal handles non-bulk paths (manual workflows, retroactive ML, admin actions).Tests
TestSourceImageCollectionCountsDenormalize(7 cases) — image add/remove, Detection create/delete, null-bbox processed-but-not-with-detections case,update_calculated_fields()recovery from drift, pure-computeget_source_image_counts()returns dict without writes. A tautologicaltest_initial_counts_zerowas dropped during review — it asserted Django'sdefault=0, not our code.TestSourceImageCollectionListQueryCount(3 cases) — page-size scaling,with_counts=true,ordering=-source_images_count(sort hits the cached column directly, no extra subquery).ami.main+ami.ml+ami.jobssuites pass on local.Rollout
default=0and backfills in one deploy. Backfill is the slow step on prod-sized data — measure on staging arbutus-2026 first.Test plan
ami.main+ami.ml+ami.jobstest suites/api/v2/captures/collections/p99 drops vs productionSummary by CodeRabbit
Release Notes
New Features
Chores