From 7b63939fd8f25d0bc45aabd3958ee636fd334f2a Mon Sep 17 00:00:00 2001 From: yassine Date: Fri, 8 May 2026 11:26:36 +0100 Subject: [PATCH 1/3] feat: add get_element_at(x, y) tool under --experimentalVision Adds a coordinate-driven element-inspection tool that pairs with take_screenshot + a vision model. Returns a compact markdown descriptor (tag, id, class, computed selector, bbox, attributes, role, children) by default, with optional CSS modes (matched / computed-visual / computed-full) and output modes (auto / schema / raw / selector-only). Closes #268 (proposed implementation; awaiting maintainer feedback on flag reuse, matched-CSS path, and naming). Implementation - Hot path (css != 'matched'): single page.evaluate() doing document.elementFromPoint, open-shadow piercing, same-origin iframe descent, descriptor build, and computed-style filtering. - Matched-CSS path: contained pptrPage.createCDPSession() for DOM.getNodeForLocation + CSS.getMatchedStylesForNode, with finally detach. First raw-CDP usage in a tool handler; scope is intentionally narrow. - cssPath selector port from DevTools' DOMPath.ts with uniqueness fallback to nth-of-type chain. - 50KB outerHTML cap with context.saveTemporaryFile fallback. - filePath param writes the full descriptor (untruncated outerHTML + full computed CSS + matched rules) to disk. Edge cases handled - Outside viewport, no-element-at-point, closed shadow roots, cross-origin/OOPIF iframes (returns iframe + flag), pointer-events: none, sticky/fixed stacking, page zoom / device emulation, detached node race (single synchronous evaluate turn), huge attribute values. Tests - tests/tools/input.test.ts: 10 cases covering modes, css modes, shadow piercing, iframe descent, no-element, filePath round-trip. - tests/index.test.ts: registration check under --experimental-vision. - tests/e2e/chrome-devtools-commands.test.ts: disabled-flag error surface mirrors click_at. - scripts/eval_scenarios/get_element_at_test.ts: Gemini eval with loose coord-range assertion, gated via serverArgs. Generated - README.md, docs/tool-reference.md, src/bin/chrome-devtools-cli-options.ts, src/telemetry/tool_call_metrics.json regenerated via npm run gen. --- README.md | 3 +- docs/tool-reference.md | 18 +- scripts/eval_scenarios/get_element_at_test.ts | 89 ++ src/bin/chrome-devtools-cli-options.ts | 51 + src/telemetry/tool_call_metrics.json | 29 + src/tools/input.ts | 1120 ++++++++++++++++- tests/e2e/chrome-devtools-commands.test.ts | 17 + tests/index.test.ts | 2 + tests/tools/input.test.ts | 432 +++++++ 9 files changed, 1757 insertions(+), 4 deletions(-) create mode 100644 scripts/eval_scenarios/get_element_at_test.ts diff --git a/README.md b/README.md index 863564269..79babc8c7 100644 --- a/README.md +++ b/README.md @@ -477,7 +477,7 @@ If you run into any issues, checkout our [troubleshooting guide](./docs/troubles -- **Input automation** (10 tools) +- **Input automation** (11 tools) - [`click`](docs/tool-reference.md#click) - [`drag`](docs/tool-reference.md#drag) - [`fill`](docs/tool-reference.md#fill) @@ -488,6 +488,7 @@ If you run into any issues, checkout our [troubleshooting guide](./docs/troubles - [`type_text`](docs/tool-reference.md#type_text) - [`upload_file`](docs/tool-reference.md#upload_file) - [`click_at`](docs/tool-reference.md#click_at) + - [`get_element_at`](docs/tool-reference.md#get_element_at) - **Navigation automation** (6 tools) - [`close_page`](docs/tool-reference.md#close_page) - [`list_pages`](docs/tool-reference.md#list_pages) diff --git a/docs/tool-reference.md b/docs/tool-reference.md index 187280d9c..85465a60f 100644 --- a/docs/tool-reference.md +++ b/docs/tool-reference.md @@ -2,7 +2,7 @@ # Chrome DevTools MCP Tool Reference -- **[Input automation](#input-automation)** (10 tools) +- **[Input automation](#input-automation)** (11 tools) - [`click`](#click) - [`drag`](#drag) - [`fill`](#fill) @@ -13,6 +13,7 @@ - [`type_text`](#type_text) - [`upload_file`](#upload_file) - [`click_at`](#click_at) + - [`get_element_at`](#get_element_at) - **[Navigation automation](#navigation-automation)** (6 tools) - [`close_page`](#close_page) - [`list_pages`](#list_pages) @@ -175,6 +176,21 @@ --- +### `get_element_at` + +**Description:** Returns the DOM element at viewport-relative CSS-pixel coordinates (x, y). Pairs with [`take_screenshot`](#take_screenshot) + a vision model that emits coordinates. Pierces open shadow roots by default. Limitations: cannot enter closed shadow roots; cannot enter cross-origin/OOPIF iframes (you'll get the <iframe> element with crossOriginFrame=true); css="matched" requires the experimentalVision flag and uses Chrome DevTools Protocol. For huge elements use mode="schema" (default) or pass filePath to write the full descriptor to disk. (requires flag: --experimentalVision=true) + +**Parameters:** + +- **x** (number) **(required)**: CSS-pixel X coordinate, viewport-relative. +- **y** (number) **(required)**: CSS-pixel Y coordinate, viewport-relative. +- **css** (enum: "none", "matched", "computed-visual", "computed-full") _(optional)_: CSS data to include. matched = author rules from cascade (uses CDP). computed-visual = ~30 visually relevant computed properties. computed-full = all computed properties (saved to file when large). +- **filePath** (string) _(optional)_: If set, writes the full descriptor (raw outerHTML + full computed CSS) to this path and returns a summary in the response. +- **mode** (enum: "auto", "schema", "raw", "selector-only") _(optional)_: Output detail level. auto/schema = compact MD descriptor. raw = full outerHTML (truncated to 50KB or saved to file). selector-only = just the CSS selector. +- **pierceShadow** (boolean) _(optional)_: Whether to descend into open shadow roots. Default true. Closed shadow roots are never pierced. + +--- + ## Navigation automation ### `close_page` diff --git a/scripts/eval_scenarios/get_element_at_test.ts b/scripts/eval_scenarios/get_element_at_test.ts new file mode 100644 index 000000000..8f5f99970 --- /dev/null +++ b/scripts/eval_scenarios/get_element_at_test.ts @@ -0,0 +1,89 @@ +/** + * @license + * Copyright 2026 Google LLC + * SPDX-License-Identifier: Apache-2.0 + */ + +import assert from 'node:assert'; + +import type {TestScenario} from '../eval_gemini.ts'; + +export const scenario: TestScenario = { + serverArgs: ['--experimentalVision=true'], + prompt: `Take a screenshot of . There is a single large blue square on the page. Use the get_element_at tool to inspect the DOM element at the center of that blue square (the page is 800x600 and the square spans roughly x=100..300, y=100..300, so a coordinate around 200,200 is appropriate). Then tell me the element's id and class.`, + maxTurns: 4, + htmlRoute: { + path: '/get_element_at_test.html', + htmlContent: ` + + + + + + +
CLICK ME
+ + + `, + }, + expectations: calls => { + const visualCalls = calls.filter( + c => c.name === 'take_screenshot' || c.name === 'take_snapshot', + ); + assert.ok( + visualCalls.length >= 1, + 'Expected at least one take_screenshot or take_snapshot call before inspecting coordinates', + ); + + const elementAtCalls = calls.filter(c => c.name === 'get_element_at'); + assert.ok( + elementAtCalls.length >= 1, + 'Expected at least one get_element_at call', + ); + + let withinTarget = 0; + for (const call of elementAtCalls) { + const x = call.args.x; + const y = call.args.y; + assert.strictEqual( + typeof x, + 'number', + 'get_element_at must receive a numeric x', + ); + assert.strictEqual( + typeof y, + 'number', + 'get_element_at must receive a numeric y', + ); + if ( + typeof x === 'number' && + typeof y === 'number' && + x >= 100 && + x <= 300 && + y >= 100 && + y <= 300 + ) { + withinTarget++; + } + } + assert.ok( + withinTarget >= 1, + 'Expected at least one get_element_at call with x in [100,300] and y in [100,300] (inside the blue square)', + ); + }, +}; diff --git a/src/bin/chrome-devtools-cli-options.ts b/src/bin/chrome-devtools-cli-options.ts index 695f63a80..dbdbae266 100644 --- a/src/bin/chrome-devtools-cli-options.ts +++ b/src/bin/chrome-devtools-cli-options.ts @@ -277,6 +277,57 @@ export const commands: Commands = { }, }, }, + get_element_at: { + description: + 'Returns the DOM element at viewport-relative CSS-pixel coordinates (x, y). Pairs with take_screenshot + a vision model that emits coordinates. Pierces open shadow roots by default. Limitations: cannot enter closed shadow roots; cannot enter cross-origin/OOPIF iframes (you\'ll get the `, + ); + await page.waitForFunction( + () => { + const frame = document.querySelector('iframe'); + return Boolean(frame?.contentDocument?.querySelector('#inner')); + }, + {timeout: 5000}, + ); + await getElementAt.handler( + { + params: { + x: 50, + y: 50, + mode: 'auto', + css: 'none', + }, + page: context.getSelectedMcpPage(), + }, + response, + context, + ); + const output = response.responseLines.join('\n'); + assert.ok( + output.includes('`inner`'), + `output should identify the inner element id: ${output}`, + ); + assert.ok( + output.includes('frameOrigin='), + `output should include a frameOrigin indicator: ${output}`, + ); + }); + }); + + it('writes the full descriptor to disk when filePath is provided', async () => { + await withMcpContext(async (response, context) => { + const page = context.getSelectedPptrPage(); + await page.setContent( + html``, + ); + const tmpDir = await fs.mkdtemp( + path.join(os.tmpdir(), 'get-element-at-test-'), + ); + const filePath = path.join(tmpDir, 'desc.json'); + try { + await getElementAt.handler( + { + params: { + x: 50, + y: 50, + mode: 'auto', + css: 'none', + filePath, + }, + page: context.getSelectedMcpPage(), + }, + response, + context, + ); + const output = response.responseLines.join('\n'); + assert.ok( + output.includes('Saved full element descriptor to'), + `output should reference the saved descriptor: ${output}`, + ); + assert.ok( + output.includes('desc.json'), + `output should include the file name: ${output}`, + ); + const written = await fs.readFile(filePath, 'utf8'); + const parsed: unknown = JSON.parse(written); + assert.ok( + parsed !== null && + typeof parsed === 'object' && + 'descriptor' in parsed, + 'written file should contain a JSON object with a descriptor field', + ); + const descriptorField = parsed.descriptor; + assert.ok( + descriptorField !== null && typeof descriptorField === 'object', + 'descriptor field should be an object', + ); + } finally { + await fs.rm(tmpDir, {recursive: true, force: true}); + } + }); + }); + }); }); From b5fc5bccf77a08052b5be3284600937aea6ddc40 Mon Sep 17 00:00:00 2001 From: yassine Date: Fri, 8 May 2026 12:10:37 +0100 Subject: [PATCH 2/3] fix(get_element_at): address code review findings Two functional bugs and several smaller cleanups surfaced during review. Blockers fixed: - closedShadowEncountered was set true when pierceShadow=false against an open shadow host. shadowRoot is non-null only for OPEN roots, so the flag was inverted. Drop the misleading branch; closed shadow roots cannot be detected from JS at all. - Matched-CSS path ran buildElementDescriptorInPage via Runtime.callFunctionOn against an iframe-resolved objectId, executing document.elementFromPoint in the iframe's JS context where the caller's main-frame coordinates resolved to the wrong element. Restructure fetchDescriptorViaCdp to build the descriptor in the main frame via pptrPage.evaluate and use CDP only for CSS.getMatchedStylesForNode. Smaller fixes: - computed-full mode now requires filePath (validated up front) instead of silently discarding the data after computation. - Removed dead crossOriginFrame field from ElementDescriptor (only the NotFoundResult path can ever produce it). - Collapsed identical if/else branches in the elementFromPoint loop. - Tool description corrected: cross-origin iframes return a cross-origin-blocked NotFoundResult, not an iframe descriptor. - All CDP error paths now log via logger instead of failing silently. Tests: tests/tools/input.test.ts 54/54 pass. Auto-generated docs and CLI options refreshed via npm run gen. --- docs/tool-reference.md | 2 +- src/bin/chrome-devtools-cli-options.ts | 2 +- src/tools/input.ts | 181 ++++++++++++------------- 3 files changed, 92 insertions(+), 93 deletions(-) diff --git a/docs/tool-reference.md b/docs/tool-reference.md index 85465a60f..16400d810 100644 --- a/docs/tool-reference.md +++ b/docs/tool-reference.md @@ -178,7 +178,7 @@ ### `get_element_at` -**Description:** Returns the DOM element at viewport-relative CSS-pixel coordinates (x, y). Pairs with [`take_screenshot`](#take_screenshot) + a vision model that emits coordinates. Pierces open shadow roots by default. Limitations: cannot enter closed shadow roots; cannot enter cross-origin/OOPIF iframes (you'll get the <iframe> element with crossOriginFrame=true); css="matched" requires the experimentalVision flag and uses Chrome DevTools Protocol. For huge elements use mode="schema" (default) or pass filePath to write the full descriptor to disk. (requires flag: --experimentalVision=true) +**Description:** Returns the DOM element at viewport-relative CSS-pixel coordinates (x, y). Pairs with [`take_screenshot`](#take_screenshot) + a vision model that emits coordinates. Pierces open shadow roots by default. Limitations: cannot enter closed shadow roots; cannot enter cross-origin/OOPIF iframes (the call returns a 'cross-origin-blocked' result with partial metadata about the iframe); css="matched" requires the experimentalVision flag and uses Chrome DevTools Protocol. For huge elements use mode="schema" (default) or pass filePath to write the full descriptor to disk. (requires flag: --experimentalVision=true) **Parameters:** diff --git a/src/bin/chrome-devtools-cli-options.ts b/src/bin/chrome-devtools-cli-options.ts index dbdbae266..edd64b77e 100644 --- a/src/bin/chrome-devtools-cli-options.ts +++ b/src/bin/chrome-devtools-cli-options.ts @@ -279,7 +279,7 @@ export const commands: Commands = { }, get_element_at: { description: - 'Returns the DOM element at viewport-relative CSS-pixel coordinates (x, y). Pairs with take_screenshot + a vision model that emits coordinates. Pierces open shadow roots by default. Limitations: cannot enter closed shadow roots; cannot enter cross-origin/OOPIF iframes (you\'ll get the