Skip to content

feat: add get_element_at(x, y) tool under --experimentalVision#2021

Open
medyas wants to merge 5 commits into
ChromeDevTools:mainfrom
medyas:feat/get-element-at-vision
Open

feat: add get_element_at(x, y) tool under --experimentalVision#2021
medyas wants to merge 5 commits into
ChromeDevTools:mainfrom
medyas:feat/get-element-at-vision

Conversation

@medyas
Copy link
Copy Markdown

@medyas medyas commented May 8, 2026

Closes #268.

Adds a coordinate-driven element-inspection primitive that pairs with take_screenshot + a vision model.

What this version does

get_element_at(x, y) returns the uid of the element at the viewport-relative CSS-pixel coordinate, and refreshes the page snapshot. The returned uid is consumable by the existing uid-based tools (click, hover, fill, drag, upload_file, …).

Gated by the existing --experimentalVision flag (same as click_at).

Schema

{
  x: number,  // viewport-relative CSS px
  y: number,
}

Response

  • Element uid: <uid>
  • Refreshed snapshot (response.includeSnapshot()).

Behavior

  • Hit-test uses elementFromPoint with open-shadow piercing and same-origin iframe descent.
  • Element already in the a11y snapshot: returned directly.
  • Element outside the a11y tree (e.g. a plain <div>): injected via the existing extraHandles mechanism (the same path third-party-developer tools use), so the returned uid resolves through McpPage.getElementByUid.

Limitations

  • Cannot reach closed shadow roots (no JS or CDP API exposes them).
  • Cannot reach cross-origin / OOPIF iframes — error message points at list_pages + select_page for switching into the inner target.

Background

The first revision of this PR returned a self-contained markdown descriptor with CSS modes, file output, and a CDP-backed matched-rules path. Per @OrKoN's review, that was out of scope for a get_element_at primitive — the tool should integrate with the snapshot/uid system the rest of the tools are built around. This rewrite drops mode/css/pierceShadow/filePath, removes the descriptor formatter, computed-style projection, matched-rules summary, 50KB outerHTML cap, and the CDP matched-CSS path (~1100 lines).

The previous descriptor-style implementation is preserved on backup/get-element-at-full-impl for reference; if a separate get_element_styles(uid) style tool is ever wanted, the building blocks are there.

Test plan

Real Chrome under withMcpContext:

  • returns a uid for an element at coordinates + refreshes the snapshot (round-trip via getElementByUid confirms id matches)
  • throws with self-healing message when coords are outside the viewport
  • resolves a plain non-accessible <div> via the extraHandles fallback (uid round-trip)
  • pierces open shadow roots (resolved uid → inner #inner button)
  • descends into a same-origin iframe (resolved uid → inner #inner element)
  • tool registration under --experimental-vision (tests/index.test.ts)
  • disabled-flag error message (tests/e2e/chrome-devtools-commands.test.ts)
  • Gemini eval scenario at scripts/eval_scenarios/get_element_at_test.ts

npm run typecheck clean. npm run test -- tests/tools/input.test.ts passes. npm run gen clean (auto-regenerated docs/tool-reference.md, src/bin/chrome-devtools-cli-options.ts, src/telemetry/tool_call_metrics.json).

CLA: still pending sign-off.

Adds a coordinate-driven element-inspection tool that pairs with
take_screenshot + a vision model. Returns a compact markdown descriptor
(tag, id, class, computed selector, bbox, attributes, role, children)
by default, with optional CSS modes (matched / computed-visual /
computed-full) and output modes (auto / schema / raw / selector-only).

Closes ChromeDevTools#268 (proposed implementation; awaiting maintainer feedback on
flag reuse, matched-CSS path, and naming).

Implementation
- Hot path (css != 'matched'): single page.evaluate() doing
  document.elementFromPoint, open-shadow piercing, same-origin iframe
  descent, descriptor build, and computed-style filtering.
- Matched-CSS path: contained pptrPage.createCDPSession() for
  DOM.getNodeForLocation + CSS.getMatchedStylesForNode, with finally
  detach. First raw-CDP usage in a tool handler; scope is intentionally
  narrow.
- cssPath selector port from DevTools' DOMPath.ts with uniqueness
  fallback to nth-of-type chain.
- 50KB outerHTML cap with context.saveTemporaryFile fallback.
- filePath param writes the full descriptor (untruncated outerHTML +
  full computed CSS + matched rules) to disk.

Edge cases handled
- Outside viewport, no-element-at-point, closed shadow roots,
  cross-origin/OOPIF iframes (returns iframe + flag), pointer-events:
  none, sticky/fixed stacking, page zoom / device emulation, detached
  node race (single synchronous evaluate turn), huge attribute values.

Tests
- tests/tools/input.test.ts: 10 cases covering modes, css modes, shadow
  piercing, iframe descent, no-element, filePath round-trip.
- tests/index.test.ts: registration check under --experimental-vision.
- tests/e2e/chrome-devtools-commands.test.ts: disabled-flag error
  surface mirrors click_at.
- scripts/eval_scenarios/get_element_at_test.ts: Gemini eval with loose
  coord-range assertion, gated via serverArgs.

Generated
- README.md, docs/tool-reference.md, src/bin/chrome-devtools-cli-options.ts,
  src/telemetry/tool_call_metrics.json regenerated via npm run gen.
Two functional bugs and several smaller cleanups surfaced during review.

Blockers fixed:
- closedShadowEncountered was set true when pierceShadow=false against an
  open shadow host. shadowRoot is non-null only for OPEN roots, so the
  flag was inverted. Drop the misleading branch; closed shadow roots
  cannot be detected from JS at all.
- Matched-CSS path ran buildElementDescriptorInPage via
  Runtime.callFunctionOn against an iframe-resolved objectId, executing
  document.elementFromPoint in the iframe's JS context where the caller's
  main-frame coordinates resolved to the wrong element. Restructure
  fetchDescriptorViaCdp to build the descriptor in the main frame via
  pptrPage.evaluate and use CDP only for CSS.getMatchedStylesForNode.

Smaller fixes:
- computed-full mode now requires filePath (validated up front) instead
  of silently discarding the data after computation.
- Removed dead crossOriginFrame field from ElementDescriptor (only the
  NotFoundResult path can ever produce it).
- Collapsed identical if/else branches in the elementFromPoint loop.
- Tool description corrected: cross-origin iframes return a
  cross-origin-blocked NotFoundResult, not an iframe descriptor.
- All CDP error paths now log via logger instead of failing silently.

Tests: tests/tools/input.test.ts 54/54 pass. Auto-generated docs and
CLI options refreshed via npm run gen.
@google-cla
Copy link
Copy Markdown

google-cla Bot commented May 8, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Collaborator

@OrKoN OrKoN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this seems to do more than get_element_at (querying styles, storing data to files). get_element_at should just return an uid from the snapshot that matches the element at coordinates to integrate with the rest of tools. It should probably use browser's hit testing API and it would be more maintainable if corresponding CDP support is added first. Perhaps you could file a feature request first with details about the use case so that we could prioritize it based on the user feedback? I am also not sure if #268 has anything to do with the experimental vision mode.

Per the code review on ChromeDevTools#2021, `get_element_at(x, y)` now returns the uid
of the element at the coordinate and refreshes the snapshot, instead of
producing its own descriptor format. The uid is consumable by the existing
uid-based tools (click, hover, fill, drag, upload_file, …).

Removed: mode (auto/schema/raw/selector-only), css (none/matched/
computed-visual/computed-full), pierceShadow, filePath. The descriptor
formatter, computed-style projection, matched-rules summary, 50KB outerHTML
cap + temp-file fallback, full-descriptor JSON save, and the CDP-based
matched-CSS path are all gone (~1100 lines).

Hit-test uses elementFromPoint with open-shadow piercing and same-origin
iframe descent. Elements outside the a11y tree (e.g. plain divs) are
injected via the existing extraHandles mechanism so the returned uid is
resolvable by McpPage.getElementByUid.

The previous descriptor-style implementation is preserved on
backup/get-element-at-full-impl for reference.
@medyas
Copy link
Copy Markdown
Author

medyas commented May 15, 2026

Thanks for the steer @OrKoN — you were right that the original shape was doing too much. Pushed a rewrite (dadcac9) that scopes this down to a uid-returning primitive matching the rest of the tools:

  • Returns: Element uid: <uid> + refreshed snapshot. The uid flows directly into click / hover / fill / etc.
  • Schema: {x, y} — dropped mode, css, pierceShadow, filePath.
  • Dropped: descriptor formatter, computed-style projection, matched-rules summary, 50KB outerHTML cap + temp-file path, full-descriptor JSON save, CDP CSS.getMatchedStylesForNode path. About 1100 lines.
  • Hit-test: elementFromPoint with open-shadow piercing and same-origin iframe descent. Elements outside the a11y tree (plain divs) are injected via the existing extraHandles mechanism — the same path the third-party-developer tools use — so the uid is resolvable by McpPage.getElementByUid.
  • Limits: closed shadow roots and cross-origin / OOPIF iframes return a self-healing error pointing at list_pages + select_page.

On the "browser hit-testing API + CDP support first" point — I went with elementFromPoint (browser's native primitive, no internal CDP plumbing) since DOM.getNodeForLocation → ElementHandle requires reaching into puppeteer internals and the practical limits (closed shadow / OOPIF) are the same either way. Happy to switch to CDP if you'd prefer.

On #268 — agree the linkage is loose. The issue thread covers a few mechanisms (coord→DOM, overlay UIDs / Set-of-Marks, DevTools selection bridge). This PR covers only the first; it doesn't close out the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A tool to "Select an element in the page to inspect it"

3 participants