Skip to content

OpenAI web search: citation stripping not applied to GPT-5.x models #170566

@ragio579

Description

@ragio579

The problem

When using GPT-5.x models (e.g. gpt-5.4) with web_search: true and inline_citations: false, citations like ([legaseriea.it](https://...)) are still included in responses.

This happens because in entity.py, GPT-5.x models are classified as reasoning models (line 525):

if model_args["model"].startswith(("o", "gpt-5")):
    model_args["reasoning"] = reasoning

The regex-based citation stripping (added in PR #154292) is then skipped (line 600):

if "reasoning" not in model_args:
    remove_citations = True  # Never reached for gpt-5.x

The assumption is that reasoning models respect the prompt instruction "When doing a web search, do not include source citations". GPT-5.x models do not respect this instruction — they consistently include inline citations regardless of the prompt.

Environment

  • HA version: 2026.5.0
  • Model: gpt-5.4
  • inline_citations: false
  • web_search: true

Expected behavior

The regex citation stripper should also be applied to GPT-5.x models, since they do not follow the prompt-only approach.

Example

Prompt: "Cerca online il risultato di Lazio Inter di ieri sera"

Response (with inline_citations: false):

Ieri sera Lazio-Inter è finita 0 a 2. Hanno segnato l'autogol di Marusic e Lautaro Martínez. (legaseriea.it)

The (legaseriea.it) citation should have been stripped.

Suggested fix

Remove the if "reasoning" not in model_args guard, or apply the regex stripper unconditionally when inline_citations is false. The regex approach is a safe fallback even for reasoning models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Priority

    None yet

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions