feat: add get_element_at(x, y) tool under --experimentalVision#2021
feat: add get_element_at(x, y) tool under --experimentalVision#2021medyas wants to merge 5 commits into
Conversation
Adds a coordinate-driven element-inspection tool that pairs with take_screenshot + a vision model. Returns a compact markdown descriptor (tag, id, class, computed selector, bbox, attributes, role, children) by default, with optional CSS modes (matched / computed-visual / computed-full) and output modes (auto / schema / raw / selector-only). Closes ChromeDevTools#268 (proposed implementation; awaiting maintainer feedback on flag reuse, matched-CSS path, and naming). Implementation - Hot path (css != 'matched'): single page.evaluate() doing document.elementFromPoint, open-shadow piercing, same-origin iframe descent, descriptor build, and computed-style filtering. - Matched-CSS path: contained pptrPage.createCDPSession() for DOM.getNodeForLocation + CSS.getMatchedStylesForNode, with finally detach. First raw-CDP usage in a tool handler; scope is intentionally narrow. - cssPath selector port from DevTools' DOMPath.ts with uniqueness fallback to nth-of-type chain. - 50KB outerHTML cap with context.saveTemporaryFile fallback. - filePath param writes the full descriptor (untruncated outerHTML + full computed CSS + matched rules) to disk. Edge cases handled - Outside viewport, no-element-at-point, closed shadow roots, cross-origin/OOPIF iframes (returns iframe + flag), pointer-events: none, sticky/fixed stacking, page zoom / device emulation, detached node race (single synchronous evaluate turn), huge attribute values. Tests - tests/tools/input.test.ts: 10 cases covering modes, css modes, shadow piercing, iframe descent, no-element, filePath round-trip. - tests/index.test.ts: registration check under --experimental-vision. - tests/e2e/chrome-devtools-commands.test.ts: disabled-flag error surface mirrors click_at. - scripts/eval_scenarios/get_element_at_test.ts: Gemini eval with loose coord-range assertion, gated via serverArgs. Generated - README.md, docs/tool-reference.md, src/bin/chrome-devtools-cli-options.ts, src/telemetry/tool_call_metrics.json regenerated via npm run gen.
Two functional bugs and several smaller cleanups surfaced during review. Blockers fixed: - closedShadowEncountered was set true when pierceShadow=false against an open shadow host. shadowRoot is non-null only for OPEN roots, so the flag was inverted. Drop the misleading branch; closed shadow roots cannot be detected from JS at all. - Matched-CSS path ran buildElementDescriptorInPage via Runtime.callFunctionOn against an iframe-resolved objectId, executing document.elementFromPoint in the iframe's JS context where the caller's main-frame coordinates resolved to the wrong element. Restructure fetchDescriptorViaCdp to build the descriptor in the main frame via pptrPage.evaluate and use CDP only for CSS.getMatchedStylesForNode. Smaller fixes: - computed-full mode now requires filePath (validated up front) instead of silently discarding the data after computation. - Removed dead crossOriginFrame field from ElementDescriptor (only the NotFoundResult path can ever produce it). - Collapsed identical if/else branches in the elementFromPoint loop. - Tool description corrected: cross-origin iframes return a cross-origin-blocked NotFoundResult, not an iframe descriptor. - All CDP error paths now log via logger instead of failing silently. Tests: tests/tools/input.test.ts 54/54 pass. Auto-generated docs and CLI options refreshed via npm run gen.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
OrKoN
left a comment
There was a problem hiding this comment.
Thanks for the PR, this seems to do more than get_element_at (querying styles, storing data to files). get_element_at should just return an uid from the snapshot that matches the element at coordinates to integrate with the rest of tools. It should probably use browser's hit testing API and it would be more maintainable if corresponding CDP support is added first. Perhaps you could file a feature request first with details about the use case so that we could prioritize it based on the user feedback? I am also not sure if #268 has anything to do with the experimental vision mode.
Per the code review on ChromeDevTools#2021, `get_element_at(x, y)` now returns the uid of the element at the coordinate and refreshes the snapshot, instead of producing its own descriptor format. The uid is consumable by the existing uid-based tools (click, hover, fill, drag, upload_file, …). Removed: mode (auto/schema/raw/selector-only), css (none/matched/ computed-visual/computed-full), pierceShadow, filePath. The descriptor formatter, computed-style projection, matched-rules summary, 50KB outerHTML cap + temp-file fallback, full-descriptor JSON save, and the CDP-based matched-CSS path are all gone (~1100 lines). Hit-test uses elementFromPoint with open-shadow piercing and same-origin iframe descent. Elements outside the a11y tree (e.g. plain divs) are injected via the existing extraHandles mechanism so the returned uid is resolvable by McpPage.getElementByUid. The previous descriptor-style implementation is preserved on backup/get-element-at-full-impl for reference.
|
Thanks for the steer @OrKoN — you were right that the original shape was doing too much. Pushed a rewrite (dadcac9) that scopes this down to a uid-returning primitive matching the rest of the tools:
On the "browser hit-testing API + CDP support first" point — I went with On #268 — agree the linkage is loose. The issue thread covers a few mechanisms (coord→DOM, overlay UIDs / Set-of-Marks, DevTools selection bridge). This PR covers only the first; it doesn't close out the issue. |
Closes #268.
Adds a coordinate-driven element-inspection primitive that pairs with
take_screenshot+ a vision model.What this version does
get_element_at(x, y)returns the uid of the element at the viewport-relative CSS-pixel coordinate, and refreshes the page snapshot. The returned uid is consumable by the existing uid-based tools (click,hover,fill,drag,upload_file, …).Gated by the existing
--experimentalVisionflag (same asclick_at).Schema
Response
Element uid: <uid>response.includeSnapshot()).Behavior
elementFromPointwith open-shadow piercing and same-origin iframe descent.<div>): injected via the existingextraHandlesmechanism (the same path third-party-developer tools use), so the returned uid resolves throughMcpPage.getElementByUid.Limitations
list_pages+select_pagefor switching into the inner target.Background
The first revision of this PR returned a self-contained markdown descriptor with CSS modes, file output, and a CDP-backed matched-rules path. Per @OrKoN's review, that was out of scope for a
get_element_atprimitive — the tool should integrate with the snapshot/uid system the rest of the tools are built around. This rewrite drops mode/css/pierceShadow/filePath, removes the descriptor formatter, computed-style projection, matched-rules summary, 50KB outerHTML cap, and the CDP matched-CSS path (~1100 lines).The previous descriptor-style implementation is preserved on
backup/get-element-at-full-implfor reference; if a separateget_element_styles(uid)style tool is ever wanted, the building blocks are there.Test plan
Real Chrome under
withMcpContext:getElementByUidconfirms id matches)<div>via theextraHandlesfallback (uid round-trip)#innerbutton)#innerelement)--experimental-vision(tests/index.test.ts)tests/e2e/chrome-devtools-commands.test.ts)scripts/eval_scenarios/get_element_at_test.tsnpm run typecheckclean.npm run test -- tests/tools/input.test.tspasses.npm run genclean (auto-regenerateddocs/tool-reference.md,src/bin/chrome-devtools-cli-options.ts,src/telemetry/tool_call_metrics.json).CLA: still pending sign-off.