Skip to content

feat(skills): agent-skills (SKILL.md) format support for review/improve/describe#2385

Open
IsmaelMartinez wants to merge 6 commits into
The-PR-Agent:mainfrom
IsmaelMartinez:feat/agent-skills-support
Open

feat(skills): agent-skills (SKILL.md) format support for review/improve/describe#2385
IsmaelMartinez wants to merge 6 commits into
The-PR-Agent:mainfrom
IsmaelMartinez:feat/agent-skills-support

Conversation

@IsmaelMartinez
Copy link
Copy Markdown

Summary

Implements agent-skills (SKILL.md) format support for /review, /improve, and /describe, addressing #2384. Skills are discovered from configured filesystem paths, parsed for YAML frontmatter (name + description), and injected into prompts alongside extra_instructions. Sibling *.md files in the skill directory tree (including references/ subdirectories) are inlined as part of the same skill; scripts/ and assets/ subdirectories are skipped because PR-Agent's single-shot prompt architecture cannot execute scripts or load binary assets on demand. Disabled by default; gated behind [skills] enabled = true.

The value proposition is org-wide curated skill libraries distributed at the PR-Agent host level — install one set of skills, reuse across many repos. This is distinct from per-repo guidance checked into version control, which is the focus of related work in flight (see "Relationship to in-flight work" below).

Scope

Parsing, discovery, formatting, and injection of text-only agent skills from filesystem paths. Every discovered skill's body and inlined *.md resources are added to every prompt when the feature is enabled. The description field is preserved and rendered as a section header in the injected context but does not currently drive selective loading.

Limitations

PR-Agent dispatches single-shot model calls — there is no tool-use loop in which the model could Read a file mid-turn — so the agent-skills standard's progressive-disclosure model (model sees name + description at startup, reads SKILL.md only on selection, reads references/*.md only on demand) is not implementable on the current architecture. Every text byte of every enabled skill is loaded on every PR. The max_skills_tokens budget (default 8000, user-configurable) caps the combined block; skills past the cap are dropped from the end with a warning. Skills that depend on script execution or binary assets will not work — those subdirectories are excluded from inlining and PR-Agent has no mechanism to surface them.

A draft design for a two-pass selector — a cheap weak-model call that picks relevant skills before the main tool prompt — is in AGENT_SKILLS_DESIGN.md on this branch. That is the architecturally correct way to honor progressive disclosure within the current single-shot flow. Long-term, the in-flight MCP PRs (#2348, #2356) would offer a cleaner path: with MCP tool calling, the model could invoke a read_skill_body tool on demand, removing the need for a separate selector pass entirely. Either follow-up is out of scope for this PR.

Relationship to in-flight work

Several adjacent PRs and issues touch the broader "let OSS users supply richer custom guidance to PR-Agent" gap. This PR is intended to complement, not replace, them.

#2382 (load repo best_practices.md in OSS) addresses #2377 by reading a single file from the PR's own repository via a new GitProvider.get_pr_agent_repo_custom_file() method. Its scope is per-repo guidance, checked into version control, scoped to /improve. This PR's scope is host-level skill libraries, configured outside any one repo, available to /review, /improve, and /describe. The two are complementary: a team could use #2382 for repo-specific rules and this for shared org standards in the same deployment. There is some file-level overlap (pr_code_suggestions.py, pr_reviewer.py, configuration.toml, three prompt TOMLs add different blocks in different sections); whichever lands second will need a small rebase but not a redesign.

#2304 (asbach — add_repo_metadata for AGENTS.md / QODO.md / claude.md) takes a similar shape to #2382: read named files from the PR's repo and inject. If it lands, a follow-up to this PR could optionally use the same get_repo_file() plumbing to discover skills inside the PR's repo (for example, a .pr_agent_skills/ directory checked into version control), in addition to host-level paths. This branch deliberately did not depend on #2304 to keep review surface independent.

#2348 and #2356 (MCP tool orchestration / registry) introduce tool-use capability to PR-Agent. As noted under Limitations, this is the architecturally cleanest substrate for the progressive-disclosure half of the agent-skills standard, and is the natural home for any future "selectively load skill content" work.

Issue references for context: #2384 (this proposal), #2311 (the original "skills support" inquiry), #1766 (the long-running "custom rules" thread, where the workaround pattern is to stuff guidance into extra_instructions).

Configuration

[skills]
enabled = false
paths = []                  # supports ~ and $VAR
max_skills_tokens = 8000

Repos can override these in their own .pr_agent.toml per the existing Dynaconf merge logic.

Test plan

  • PYTHONPATH=. ./.venv/bin/pytest tests/unittest -v (363 passed locally; 31 new tests in tests/unittest/test_skills_loader.py covering parsing, discovery, formatting, budget enforcement, env-var path expansion, Jinja-syntax safety, sibling-md gathering, references/ inlining, scripts/+assets/ exclusion, and nested-SKILL.md isolation)
  • Manual: enable [skills], point paths at a directory containing one or more SKILL.md files, run python -m pr_agent.cli --pr_url <url> review against a real PR, confirm skills_context reaches the rendered prompt
  • Disabled path: with enabled = false, prompts are byte-identical to pre-feature output (the {%- if skills_context %} block produces nothing)
  • Budget enforcement: oversized skill content triggers a [truncated] marker without raising

Refs: #2384, #2311, #1766. Complements: #2382, #2304, #2348, #2356.

Pointer file for Claude Code: high-level prompt-building hot path,
settings/Dynaconf flow, git-provider abstraction, and the unit-test command.
Defers to AGENTS.md for the full repo guidelines.
Discover SKILL.md files from configured filesystem paths, parse YAML
frontmatter, and inject the combined name/description/body block into the
review, improve, and describe prompts as a separate skills_context variable
(rendered above extra_instructions so user-supplied instructions still take
precedence). Activation is description-based: every discovered skill is
included with its "Use when..." description, and the model decides which
guidance to apply. A token budget (skills.max_skills_tokens, default 4000)
caps the injected block, dropping skills from the end if exceeded.

Disabled by default; enable via [skills] in configuration. No new
dependencies (uses stdlib yaml).

Refs The-PR-Agent#2384.
Apply post-review fixes to the agent-skills loader:

* Inline non-SKILL.md resources. Every *.md file in the skill directory
  tree (including references/ subdirectories) is now gathered and
  appended to the skill body in the prompt. This is the closest analogue
  to the standard's progressive-disclosure model under PR-Agent's
  single-shot prompt architecture, where the model has no opportunity
  to Read files mid-turn. scripts/ and assets/ subdirectories remain
  excluded -- the implementation supports text-only skills, and skills
  that depend on script execution will not work here. Documented in
  the loader docstring and configuration.toml.
* Treat nested SKILL.md files as independent skills: their subtree is
  not absorbed into the outer skill's resources.
* Switch get_skills_context to attribute-style settings access for
  consistency with the rest of the codebase, narrow the catch to only
  the int() coercion (programmer errors now surface), and expand
  $ENV_VAR alongside ~ in configured paths.
* Bump default max_skills_tokens from 4000 to 8000 (a typical multi-skill
  setup needed lifting; still user-overridable per repo).
* Clean up format_skills_context truncation (drop dead variable, slice
  cleanly against the budget).
* Add tests for: Jinja2 syntax in skill bodies (confirms substitution is
  single-pass and {% raw %} wrapping is unnecessary), env-var and ~
  path expansion, sibling .md and references/ resource gathering,
  scripts/ + assets/ exclusion, nested SKILL.md isolation, resource
  rendering, and resource-aware budget enforcement.
@github-actions github-actions Bot added the feature 💡 label May 11, 2026
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Add agent-skills (SKILL.md) format support for review/improve/describe tools

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implements agent-skills (SKILL.md) format support for /review, /improve, and /describe tools
• Discovers and parses SKILL.md files from configured filesystem paths with YAML frontmatter
• Inlines sibling *.md resources (including references/ subdirectories) into skill context
• Injects formatted skills into prompts via skills_context variable, positioned above
  extra_instructions
• Disabled by default; gated behind [skills] enabled = true configuration with token budget
  control
Diagram
flowchart LR
  A["Configured filesystem paths"] -->|discover_skills| B["SKILL.md files"]
  B -->|parse frontmatter| C["Skill metadata<br/>name + description"]
  B -->|gather resources| D["Sibling *.md files<br/>references/"]
  C --> E["Format skills context"]
  D --> E
  E -->|inject via skills_context| F["Review/Improve/Describe<br/>prompts"]
  F -->|render with Jinja2| G["Model receives<br/>formatted skills"]
Loading

Grey Divider

File Changes

1. pr_agent/algo/skills_loader.py ✨ Enhancement +277/-0

Core skills discovery, parsing, and formatting logic

pr_agent/algo/skills_loader.py


2. pr_agent/tools/pr_reviewer.py ✨ Enhancement +2/-0

Inject skills_context into reviewer prompt variables

pr_agent/tools/pr_reviewer.py


3. pr_agent/tools/pr_description.py ✨ Enhancement +2/-0

Inject skills_context into description prompt variables

pr_agent/tools/pr_description.py


View more (8)
4. pr_agent/tools/pr_code_suggestions.py ✨ Enhancement +2/-0

Inject skills_context into code suggestions prompt variables

pr_agent/tools/pr_code_suggestions.py


5. pr_agent/settings/pr_reviewer_prompts.toml ⚙️ Configuration changes +9/-0

Add skills_context block to reviewer system prompt

pr_agent/settings/pr_reviewer_prompts.toml


6. pr_agent/settings/pr_description_prompts.toml ⚙️ Configuration changes +8/-0

Add skills_context block to description system prompt

pr_agent/settings/pr_description_prompts.toml


7. pr_agent/settings/code_suggestions/pr_code_suggestions_prompts.toml ⚙️ Configuration changes +9/-0

Add skills_context block to code suggestions system prompt

pr_agent/settings/code_suggestions/pr_code_suggestions_prompts.toml


8. pr_agent/settings/code_suggestions/pr_code_suggestions_prompts_not_decoupled.toml ⚙️ Configuration changes +9/-0

Add skills_context block to non-decoupled code suggestions prompt

pr_agent/settings/code_suggestions/pr_code_suggestions_prompts_not_decoupled.toml


9. pr_agent/settings/configuration.toml ⚙️ Configuration changes +13/-0

Add [skills] configuration section with defaults

pr_agent/settings/configuration.toml


10. tests/unittest/test_skills_loader.py 🧪 Tests +318/-0

Comprehensive unit tests for skills loader functionality

tests/unittest/test_skills_loader.py


11. CLAUDE.md 📝 Documentation +53/-0

Developer guide for Claude Code with architecture overview

CLAUDE.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 11, 2026

Code Review by Qodo

🐞 Bugs (5) 📘 Rule violations (3) 📎 Requirement gaps (2)

Context used

Grey Divider


Action required

1. No skills.installed support 📎 Requirement gap ≡ Correctness
Description
Skills discovery only reads settings.skills.paths and the default config defines no
skills.installed list, so centrally installed skill directories cannot be configured/scanned as
required. This breaks the configured discovery contract for agent-skills and makes deployments
unable to separate repo-local vs installed paths.
Code

pr_agent/algo/skills_loader.py[R303-317]

+    settings = get_settings()
+    if not settings.skills.enabled:
+        _set_cached_context("")
+        return ""
+    paths = list(settings.skills.paths or [])
+    raw_max = settings.skills.max_skills_tokens
+    try:
+        max_tokens = int(raw_max)
+    except (TypeError, ValueError):
+        get_logger().warning(
+            f"Invalid skills.max_skills_tokens={raw_max!r}; falling back to {_DEFAULT_MAX_SKILLS_TOKENS}"
+        )
+        max_tokens = _DEFAULT_MAX_SKILLS_TOKENS
+    skills = discover_skills(paths)
+    out = format_skills_context(skills, max_tokens) if skills else ""
Evidence
PR Compliance IDs 6 and 10 require scanning both configured [skills].paths and
[skills].installed and honoring those configuration keys. The new implementation only reads
settings.skills.paths and configuration defaults add [skills] without any installed option, so
the requirement cannot be met.

Support agent skills discovery from configured repo-local and installed paths
Honor skills configuration: enabled flag, paths/installed lists, and max_skills_tokens cap
pr_agent/algo/skills_loader.py[303-317]
pr_agent/settings/configuration.toml[355-367]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The skills feature does not support the required `[skills].installed` configuration (and only scans `[skills].paths`). This prevents configuring centrally installed skill directories.
## Issue Context
Compliance requires scanning both repo-local paths and installed organization paths.
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[303-317]
- pr_agent/settings/configuration.toml[355-367]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Over-120 char lines added 📘 Rule violation ⚙ Maintainability
Description
New code introduces long lines that likely violate the repository's Ruff line-length = 120
constraint, which can fail pre-commit/CI linting. These should be wrapped/split to satisfy style
requirements.
Code

pr_agent/algo/skills_loader.py[R109-112]

+            if size > _MAX_RESOURCE_FILE_BYTES:
+                get_logger().warning(
+                    f"Skill resource skipped (exceeds {_MAX_RESOURCE_FILE_BYTES} bytes): {full} ({size} bytes)"
+                )
Evidence
PR Compliance ID 22 requires adhering to Ruff style constraints including a 120-character line
length. The added warning f-string line(s) in skills_loader.py are longer than 120 characters and
should be reformatted.

CLAUDE.md; AGENTS.md
pr_agent/algo/skills_loader.py[109-112]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Some newly added lines exceed the configured Ruff line length (120), which can cause lint failures.
## Issue Context
The repo enforces `line-length = 120` via Ruff configuration.
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[109-112]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. UTF-8 decode crash ✓ Resolved 🐞 Bug ☼ Reliability
Description
_parse_skill_file() and _gather_resources() read files as UTF-8 but only catch OSError;
invalid UTF-8 will raise UnicodeDecodeError and crash get_skills_context(), breaking /review,
/describe, and /improve when skills are enabled.
Code

pr_agent/algo/skills_loader.py[R127-134]

+def _parse_skill_file(file_path: str) -> Optional[Skill]:
+    """Parse a single SKILL.md file. Returns None and logs a warning on malformed input."""
+    try:
+        with open(file_path, "r", encoding="utf-8") as f:
+            content = f.read()
+    except OSError as e:
+        get_logger().warning(f"Skill file unreadable: {file_path} ({e})")
+        return None
Evidence
Both SKILL.md and inlined resource files are opened with encoding='utf-8', but the exception
handling only catches OSError. UnicodeDecodeError is not an OSError, so it propagates and
aborts skills loading (and therefore prompt var construction in the tools that call
get_skills_context()).

pr_agent/algo/skills_loader.py[127-135]
pr_agent/algo/skills_loader.py[114-119]
pr_agent/tools/pr_reviewer.py[92-97]
pr_agent/tools/pr_description.py[67-72]
pr_agent/tools/pr_code_suggestions.py[66-71]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`skills_loader` opens SKILL.md and resource markdown files as UTF-8, but does not handle `UnicodeDecodeError`. A single non-UTF8 file will raise during `.read()` and crash `get_skills_context()`, breaking all tools that inject `skills_context`.
### Issue Context
This affects both `_parse_skill_file()` (SKILL.md) and `_gather_resources()` (inlined `*.md` siblings).
### Fix Focus Areas
- pr_agent/algo/skills_loader.py[127-176]
- pr_agent/algo/skills_loader.py[83-122]
### Expected fix
- Catch `UnicodeDecodeError` in addition to `OSError` when reading SKILL.md and resource files.
- Log a warning and skip the malformed file (return `None` for SKILL.md; continue for resource files).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. All skills always injected 📎 Requirement gap ➹ Performance
Description
get_skills_context() always discovers and injects every skill from skills.paths (subject only to
a token cap), without selecting a relevant subset based on the PR diff/context. This violates the
requirement to activate only relevant skills using the description field as the activation signal,
and can bloat prompts and dilute guidance quality.
Code

pr_agent/algo/skills_loader.py[R254-277]

+def get_skills_context() -> str:
+    """Read settings, discover skills, and format them for prompt injection.
+
+    Returns ``''`` when skills are disabled, no paths are configured, or no
+    skills are found. The only swallowed error is a non-numeric override of
+    ``skills.max_skills_tokens``; everything else surfaces normally so genuine
+    bugs are not masked.
+    """
+    settings = get_settings()
+    if not settings.skills.enabled:
+        return ""
+    paths = list(settings.skills.paths or [])
+    raw_max = settings.skills.max_skills_tokens
+    try:
+        max_tokens = int(raw_max)
+    except (TypeError, ValueError):
+        get_logger().warning(
+            f"Invalid skills.max_skills_tokens={raw_max!r}; falling back to {_DEFAULT_MAX_SKILLS_TOKENS}"
+        )
+        max_tokens = _DEFAULT_MAX_SKILLS_TOKENS
+    skills = discover_skills(paths)
+    if not skills:
+        return ""
+    return format_skills_context(skills, max_tokens)
Evidence
PR Compliance ID 8 requires description-based activation that selects only relevant skills for a
given PR diff/context. The new implementation loads configured skills and formats them wholesale
(discover_skills(paths)format_skills_context(skills, max_tokens)) with no PR-aware filtering
step.

Description-based skill activation against PR diff/context
pr_agent/algo/skills_loader.py[254-277]
pr_agent/algo/skills_loader.py[17-19]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Skills are injected unconditionally (all discovered skills), instead of activating only a relevant subset based on PR diff/context using the skill `description` field.
## Issue Context
Compliance requires description-based activation tied to PR context/diff (e.g., keyword/file-type prefiltering or a selection pass that uses the descriptions to choose a subset).
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[254-277]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Host file exfil via skills ✓ Resolved 🐞 Bug ⛨ Security
Description
discover_skills() expands and walks arbitrary filesystem paths from skills.paths and reads their
contents into the LLM prompt. Because repo settings are merged wholesale without a section
allowlist, a repo can configure [skills] to read sensitive host files (e.g., /etc/*,
$HOME/.ssh/*) and exfiltrate them to the model.
Code

pr_agent/algo/skills_loader.py[R174-188]

+    for raw_path in paths or []:
+        if not isinstance(raw_path, str) or not raw_path.strip():
+            continue
+        expanded = os.path.expanduser(os.path.expandvars(raw_path.strip()))
+        if not os.path.exists(expanded):
+            get_logger().warning(f"Skills path does not exist: {expanded}")
+            continue
+
+        if os.path.isfile(expanded):
+            candidates = [expanded] if os.path.basename(expanded) == "SKILL.md" else []
+        else:
+            candidates = []
+            for root, _dirs, files in os.walk(expanded):
+                if "SKILL.md" in files:
+                    candidates.append(os.path.join(root, "SKILL.md"))
Evidence
The new loader expands env/tilde and traverses the configured path, then reads SKILL.md/resources
from the host filesystem for inclusion in prompts. Repo-level .pr_agent.toml settings are merged
into runtime settings for all sections, so [skills] can be controlled via repo config and thereby
drive these host filesystem reads.

pr_agent/algo/skills_loader.py[174-188]
pr_agent/algo/skills_loader.py[110-160]
pr_agent/git_providers/utils.py[62-72]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`skills.paths` currently allows reading arbitrary host filesystem content into prompts. In multi-repo/self-hosted deployments, repo-controlled settings can set `[skills]` and cause local file exfiltration to the LLM.
## Issue Context
`apply_repo_settings()` merges repo `.pr_agent.toml` sections into runtime settings without an allowlist, so repo settings can affect `[skills]`. The skills loader then expands `$VARS`/`~`, walks the filesystem, and reads files.
## Fix Focus Areas
- pr_agent/git_providers/utils.py[62-72]
- pr_agent/algo/skills_loader.py[174-188]
- pr_agent/settings/configuration.toml[355-367]
## Suggested fix approach
- Add a server-side guard so repo settings cannot override `skills.enabled` and `skills.paths` by default (e.g., an allowlist of repo-overridable sections/keys, or a new `skills.allow_repo_override=false`).
- Optionally enforce that every configured skills path must be under an admin-configured allowlisted base directory (canonicalize with `realpath` and compare prefixes), and reject/ignore paths outside it.
- Consider disabling `expandvars()` for paths that come from repo settings (or entirely), since it increases the blast radius by enabling `$HOME`-style expansion.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Unbounded resource file reads ✓ Resolved 🐞 Bug ☼ Reliability
Description
_gather_resources() reads the full content of every *.md file under each skill directory into
memory before any budget enforcement. Large or numerous markdown files can cause high memory/latency
and even OOM during tool initialization, even if later dropped by format_skills_context().
Code

pr_agent/algo/skills_loader.py[R84-105]

+    for root, dirs, files in os.walk(skill_dir):
+        # Prune executable / binary subtrees per the agent-skills convention.
+        dirs[:] = [d for d in dirs if d not in _EXCLUDED_RESOURCE_DIRS]
+        # A nested skill directory is independent; do not absorb its files.
+        if root != skill_dir and "SKILL.md" in files:
+            dirs[:] = []
+            continue
+        for filename in sorted(files):
+            if not filename.endswith(".md"):
+                continue
+            if root == skill_dir and filename == "SKILL.md":
+                continue
+            full = os.path.join(root, filename)
+            try:
+                with open(full, "r", encoding="utf-8") as fh:
+                    content = fh.read()
+            except OSError as e:
+                get_logger().warning(f"Skill resource unreadable: {full} ({e})")
+                continue
+            rel = os.path.relpath(full, skill_dir)
+            resources.append(SkillResource(relative_path=rel, content=content))
+
Evidence
Resources are eagerly loaded with fh.read() while building each Skill object, and only later
does format_skills_context() enforce a character/token budget when emitting the combined prompt
block. This means the budget does not protect memory/IO cost during discovery/parsing.

pr_agent/algo/skills_loader.py[74-107]
pr_agent/algo/skills_loader.py[154-160]
pr_agent/algo/skills_loader.py[218-251]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Skill resources are fully read into memory before any max token/char budget is applied, which can cause large memory/CPU spikes or crashes.
## Issue Context
`_parse_skill_file()` always calls `_gather_resources()` to read all sibling `*.md` resources. The token budget is only enforced later when formatting the final skills_context string.
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[74-107]
- pr_agent/algo/skills_loader.py[154-160]
- pr_agent/algo/skills_loader.py[218-251]
## Suggested fix approach
- Add a per-file size cap (e.g., skip any resource over N bytes; log a warning).
- Make resource loading lazy: store resource paths first, and only read contents while formatting until the remaining budget is exhausted.
- Alternatively, pass remaining character budget into `_gather_resources()` and stop reading/collecting once the budget is consumed.
- Consider also capping the number of resources per skill to avoid pathological directory trees.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

7. Unbounded SKILL.md read 🐞 Bug ☼ Reliability ⭐ New
Description
_parse_skill_file() reads the entire SKILL.md into memory with no size cap, while only non-SKILL
resources are capped at 256KB. A huge or accidentally generated SKILL.md can cause excessive
memory/CPU usage before any token budgeting/clipping occurs.
Code

pr_agent/algo/skills_loader.py[R127-134]

+def _parse_skill_file(file_path: str) -> Optional[Skill]:
+    """Parse a single SKILL.md file. Returns None and logs a warning on malformed input."""
+    try:
+        with open(file_path, "r", encoding="utf-8") as f:
+            content = f.read()
+    except (OSError, UnicodeDecodeError) as e:
+        get_logger().warning(f"Skill file unreadable: {file_path} ({e})")
+        return None
Evidence
Resources are protected by _MAX_RESOURCE_FILE_BYTES and a getsize() check, but SKILL.md has no
analogous guard and is always fully read via f.read().

pr_agent/algo/skills_loader.py[58-61]
pr_agent/algo/skills_loader.py[104-113]
pr_agent/algo/skills_loader.py[127-134]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`SKILL.md` is read unbounded into memory, which can DoS the process for very large files.

### Issue Context
You already cap resource markdown files at 256KB via `_MAX_RESOURCE_FILE_BYTES`, but `SKILL.md` lacks this protection.

### Fix Focus Areas
- pr_agent/algo/skills_loader.py[127-134]
- pr_agent/algo/skills_loader.py[58-61]

### Implementation notes
- Add a `_MAX_SKILL_FILE_BYTES` (could reuse `_MAX_RESOURCE_FILE_BYTES` or be slightly higher) and check `os.path.getsize(file_path)` before `read()`.
- If oversized: either skip the skill (log warning) or read only up to the cap and mark it truncated.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. Tokenizer encode can raise 🐞 Bug ☼ Reliability ⭐ New
Description
_count_tokens() calls TokenEncoder.get_token_encoder().encode(text) without the project's usual
disallowed_special=() safeguard, so certain skill contents can raise and crash skills context
formatting when skills are enabled. This can break tool initialization for /review, /improve,
and /describe because get_skills_context() is called during tool construction.
Code

pr_agent/algo/skills_loader.py[R79-80]

+def _count_tokens(text: str) -> int:
+    return len(TokenEncoder.get_token_encoder().encode(text))
Evidence
Elsewhere, the codebase explicitly passes disallowed_special=() when tokenizing arbitrary text,
but the new skills token counting does not. _count_tokens() is used multiple times during
format_skills_context(), so any exception will abort skills formatting.

pr_agent/algo/skills_loader.py[79-81]
pr_agent/algo/skills_loader.py[247-256]
pr_agent/algo/token_handler.py[154-170]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`_count_tokens()` may throw during tokenization because it calls `encode()` without `disallowed_special=()`.

### Issue Context
The codebase already uses `encode(..., disallowed_special=())` in `TokenHandler.count_tokens()`, indicating this is the preferred safe mode for arbitrary text.

### Fix Focus Areas
- pr_agent/algo/skills_loader.py[79-81]
- pr_agent/algo/token_handler.py[154-170]

### Implementation notes
- Change to `TokenEncoder.get_token_encoder().encode(text, disallowed_special=())`.
- Wrap tokenization in `try/except Exception` and on failure fall back to a conservative estimate (e.g., return a very large number so the skill gets dropped) and log a warning including the skill name/path.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Out-of-tree resource read 🐞 Bug ⛨ Security
Description
_gather_resources() reads any discovered *.md file without verifying it is a regular file inside
the skill directory after symlink resolution; a symlinked resource can point outside the skill tree
and be inlined into the LLM prompt.
Code

pr_agent/algo/skills_loader.py[R93-121]

+    for root, dirs, files in os.walk(skill_dir):
+        dirs[:] = [d for d in dirs if d not in _EXCLUDED_RESOURCE_DIRS]
+        if root != skill_dir and "SKILL.md" in files:
+            dirs[:] = []
+            continue
+        for filename in files:
+            if not filename.endswith(".md"):
+                continue
+            if root == skill_dir and filename == "SKILL.md":
+                continue
+            full = os.path.join(root, filename)
+            try:
+                size = os.path.getsize(full)
+            except OSError as e:
+                get_logger().warning(f"Skill resource unreadable: {full} ({e})")
+                continue
+            if size > _MAX_RESOURCE_FILE_BYTES:
+                get_logger().warning(
+                    f"Skill resource skipped (exceeds {_MAX_RESOURCE_FILE_BYTES} bytes): {full} ({size} bytes)"
+                )
+                continue
+            try:
+                with open(full, "r", encoding="utf-8") as fh:
+                    content = fh.read()
+            except OSError as e:
+                get_logger().warning(f"Skill resource unreadable: {full} ({e})")
+                continue
+            rel = os.path.relpath(full, skill_dir)
+            resources.append(SkillResource(relative_path=rel, content=content))
Evidence
Resources are collected via os.walk() and opened by path without checking realpath containment
(or file type). While SKILL.md discovery deduplicates by realpath, resource gathering has no
equivalent containment check, so symlinked markdown resources can escape the skill root.

pr_agent/algo/skills_loader.py[83-124]
pr_agent/algo/skills_loader.py[179-214]
pr_agent/git_providers/utils.py[13-18]
pr_agent/git_providers/utils.py[68-83]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`_gather_resources()` inlines `*.md` resources by opening paths returned from `os.walk()` without validating that the resolved path stays within the skill directory and is a regular file. This allows symlinked resources (or non-regular files) to be read and injected into prompts.
### Issue Context
Even though `[skills]` is host-only, defense-in-depth should prevent accidental or supply-chain skill bundles from exfiltrating host files via symlinks.
### Fix Focus Areas
- pr_agent/algo/skills_loader.py[83-124]
### Expected fix
- Before reading a resource file:
- Compute `skill_dir_real = os.path.realpath(skill_dir)` once.
- Compute `full_real = os.path.realpath(full)`.
- Skip if `not full_real.startswith(skill_dir_real + os.sep)`.
- Skip if not a regular file (e.g., `stat.S_ISREG(os.stat(full_real, follow_symlinks=True).st_mode)`).
- Optionally, also avoid descending into symlinked directories by filtering `dirs` accordingly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
10. skills.paths lacks normalization 📘 Rule violation ☼ Reliability
Description
settings.skills.paths is converted via list(settings.skills.paths or []) without type
normalization/validation, so a common misconfiguration (string instead of list) becomes a list of
characters and silently breaks skills discovery. This violates the requirement to validate/normalize
configuration inputs and to defensively handle variable data structures.
Code

pr_agent/algo/skills_loader.py[R262-266]

+    settings = get_settings()
+    if not settings.skills.enabled:
+        return ""
+    paths = list(settings.skills.paths or [])
+    raw_max = settings.skills.max_skills_tokens
Evidence
PR Compliance IDs 26 and 27 require configuration inputs to be validated/normalized and consumed
defensively. The new code assumes settings.skills.paths is list-like and applies list(...)
directly, which misbehaves for string inputs (iterates characters) and produces misleading warnings
rather than a targeted config warning/coercion.

pr_agent/algo/skills_loader.py[262-266]
Best Practice: Learned patterns
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`skills.paths` is not validated/normalized; a string value is treated as an iterable of characters, breaking discovery.
## Issue Context
Config should be normalized at the boundary (e.g., coerce a single string to a one-item list, drop non-strings, and emit a targeted warning when the type is wrong).
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[262-266]
- pr_agent/algo/skills_loader.py[163-200]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


11. Tests leak global settings 📘 Rule violation ☼ Reliability
Description
Unit tests mutate the process-global Dynaconf settings via get_settings().set(...) without
restoring prior values, which can leak state across tests and cause order-dependent failures. This
violates the requirement that tests must explicitly control and reset process-global state.
Code

tests/unittest/test_skills_loader.py[R165-176]

+class TestGetSkillsContext:
+    def test_disabled_returns_empty(self, tmp_path, monkeypatch):
+        from pr_agent.config_loader import get_settings
+        get_settings().set("skills", {"enabled": False, "paths": [str(tmp_path)],
+                                       "max_skills_tokens": 4000})
+        assert get_skills_context() == ""
+
+    def test_enabled_with_no_paths_returns_empty(self, monkeypatch):
+        from pr_agent.config_loader import get_settings
+        get_settings().set("skills", {"enabled": True, "paths": [],
+                                       "max_skills_tokens": 4000})
+        assert get_skills_context() == ""
Evidence
PR Compliance ID 24 requires tests to explicitly control and reset process-global state. The new
tests directly call get_settings().set(...) (global mutation) without teardown/reset, risking
cross-test contamination outside this file.

tests/unittest/test_skills_loader.py[165-194]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Tests modify global Dynaconf settings and do not restore them, creating risk of flaky, order-dependent failures.
## Issue Context
Use monkeypatch/fixtures to restore the previous `skills` config after each test (or isolate settings via a fixture that snapshots and resets global settings).
## Fix Focus Areas
- tests/unittest/test_skills_loader.py[165-194]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


12. os.walk PermissionError aborts tools 🐞 Bug ☼ Reliability
Description
discover_skills() and _gather_resources() call os.walk() without onerror or a surrounding
try/except, so an unreadable directory can raise PermissionError and fail tool initialization.
This can break /review, /describe, and /improve whenever skills.enabled=true and a
configured path has restricted permissions.
Code

pr_agent/algo/skills_loader.py[R84-90]

+    for root, dirs, files in os.walk(skill_dir):
+        # Prune executable / binary subtrees per the agent-skills convention.
+        dirs[:] = [d for d in dirs if d not in _EXCLUDED_RESOURCE_DIRS]
+        # A nested skill directory is independent; do not absorb its files.
+        if root != skill_dir and "SKILL.md" in files:
+            dirs[:] = []
+            continue
Evidence
The code guards individual file reads, but directory traversal itself is unguarded. If os.walk()
hits a directory it cannot list, the exception can bubble up through get_skills_context() (called
in tool __init__) and prevent the tool from running.

pr_agent/algo/skills_loader.py[84-90]
pr_agent/algo/skills_loader.py[182-189]
pr_agent/algo/skills_loader.py[254-277]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`os.walk()` can raise (e.g., PermissionError) on unreadable directories. Currently those exceptions are not handled, so skills loading can crash tool initialization.
## Issue Context
File reads are protected with `try/except OSError`, but the directory traversal is not.
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[74-107]
- pr_agent/algo/skills_loader.py[163-200]
## Suggested fix approach
- Add an `onerror` callback to both `os.walk()` calls that logs and continues.
- Optionally wrap the `os.walk()` loops in `try/except OSError` as an additional safety net.
- Consider logging which configured `skills.paths` entry triggered the traversal failure to aid debugging.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

13. Budget marker overflow 🐞 Bug ≡ Correctness ⭐ New
Description
format_skills_context() can return a skills_context that exceeds max_tokens when max_tokens
is smaller than the tokenized truncate marker, because it always appends "[truncated]" even if it
doesn't fit. This violates the function's own token-budget contract and can reduce available diff
budget (since prompt token counting happens after Jinja rendering).
Code

pr_agent/algo/skills_loader.py[R247-262]

+    truncate_marker = "\n\n[truncated]"
+    separator = "\n\n---\n\n"
+    sep_tokens = _count_tokens(separator)
+    marker_tokens = _count_tokens(truncate_marker)
+    pieces: List[str] = []
+    used = 0
+    for skill in skills:
+        formatted = _format_skill(skill)
+        tokens = _count_tokens(formatted)
+        addition = (sep_tokens if pieces else 0) + tokens
+        if used + addition > max_tokens:
+            if not pieces:
+                budget = max(1, max_tokens - marker_tokens)
+                truncated = clip_tokens(formatted, budget, add_three_dots=False)
+                pieces.append(truncated + truncate_marker)
+                if len(skills) > 1:
Evidence
The truncation branch forces budget = max(1, max_tokens - marker_tokens) and then unconditionally
appends the marker, so when marker_tokens > max_tokens the output necessarily exceeds
max_tokens. Prompt token counting is based on fully rendered templates, so extra tokens here
reduce space elsewhere (e.g., diff clipping).

pr_agent/algo/skills_loader.py[247-262]
pr_agent/algo/token_handler.py[74-97]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`format_skills_context()` may exceed its `max_tokens` budget in the 'first skill too large' truncation path.

### Issue Context
This is most visible when `max_tokens` is very small, but it's still a contract violation.

### Fix Focus Areas
- pr_agent/algo/skills_loader.py[247-262]

### Implementation notes
- If `max_tokens <= marker_tokens`, return a clipped marker (or empty string) that fits `max_tokens`.
- Otherwise, clip the skill to `max_tokens - marker_tokens` (allowing 0) and then append the marker.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit 33f6e27

Results up to commit N/A


🐞 Bugs (2) 📘 Rule violations (3) 📎 Requirement gaps (2)


Action required
1. No skills.installed support 📎 Requirement gap ≡ Correctness
Description
Skills discovery only reads settings.skills.paths and the default config defines no
skills.installed list, so centrally installed skill directories cannot be configured/scanned as
required. This breaks the configured discovery contract for agent-skills and makes deployments
unable to separate repo-local vs installed paths.
Code

pr_agent/algo/skills_loader.py[R303-317]

+    settings = get_settings()
+    if not settings.skills.enabled:
+        _set_cached_context("")
+        return ""
+    paths = list(settings.skills.paths or [])
+    raw_max = settings.skills.max_skills_tokens
+    try:
+        max_tokens = int(raw_max)
+    except (TypeError, ValueError):
+        get_logger().warning(
+            f"Invalid skills.max_skills_tokens={raw_max!r}; falling back to {_DEFAULT_MAX_SKILLS_TOKENS}"
+        )
+        max_tokens = _DEFAULT_MAX_SKILLS_TOKENS
+    skills = discover_skills(paths)
+    out = format_skills_context(skills, max_tokens) if skills else ""
Evidence
PR Compliance IDs 6 and 10 require scanning both configured [skills].paths and
[skills].installed and honoring those configuration keys. The new implementation only reads
settings.skills.paths and configuration defaults add [skills] without any installed option, so
the requirement cannot be met.

Support agent skills discovery from configured repo-local and installed paths
Honor skills configuration: enabled flag, paths/installed lists, and max_skills_tokens cap
pr_agent/algo/skills_loader.py[303-317]
pr_agent/settings/configuration.toml[355-367]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The skills feature does not support the required `[skills].installed` configuration (and only scans `[skills].paths`). This prevents configuring centrally installed skill directories.
## Issue Context
Compliance requires scanning both repo-local paths and installed organization paths.
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[303-317]
- pr_agent/settings/configuration.toml[355-367]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Over-120 char lines added 📘 Rule violation ⚙ Maintainability
Description
New code introduces long lines that likely violate the repository's Ruff line-length = 120
constraint, which can fail pre-commit/CI linting. These should be wrapped/split to satisfy style
requirements.
Code

pr_agent/algo/skills_loader.py[R109-112]

+            if size > _MAX_RESOURCE_FILE_BYTES:
+                get_logger().warning(
+                    f"Skill resource skipped (exceeds {_MAX_RESOURCE_FILE_BYTES} bytes): {full} ({size} bytes)"
+                )
Evidence
PR Compliance ID 22 requires adhering to Ruff style constraints including a 120-character line
length. The added warning f-string line(s) in skills_loader.py are longer than 120 characters and
should be reformatted.

CLAUDE.md; AGENTS.md
pr_agent/algo/skills_loader.py[109-112]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Some newly added lines exceed the configured Ruff line length (120), which can cause lint failures.
## Issue Context
The repo enforces `line-length = 120` via Ruff configuration.
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[109-112]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. UTF-8 decode crash ✓ Resolved 🐞 Bug ☼ Reliability
Description
_parse_skill_file() and _gather_resources() read files as UTF-8 but only catch OSError;
invalid UTF-8 will raise UnicodeDecodeError and crash get_skills_context(), breaking /review,
/describe, and /improve when skills are enabled.
Code

pr_agent/algo/skills_loader.py[R127-134]

+def _parse_skill_file(file_path: str) -> Optional[Skill]:
+    """Parse a single SKILL.md file. Returns None and logs a warning on malformed input."""
+    try:
+        with open(file_path, "r", encoding="utf-8") as f:
+            content = f.read()
+    except OSError as e:
+        get_logger().warning(f"Skill file unreadable: {file_path} ({e})")
+        return None
Evidence
Both SKILL.md and inlined resource files are opened with encoding='utf-8', but the exception
handling only catches OSError. UnicodeDecodeError is not an OSError, so it propagates and
aborts skills loading (and therefore prompt var construction in the tools that call
get_skills_context()).

pr_agent/algo/skills_loader.py[127-135]
pr_agent/algo/skills_loader.py[114-119]
pr_agent/tools/pr_reviewer.py[92-97]
pr_agent/tools/pr_description.py[67-72]
pr_agent/tools/pr_code_suggestions.py[66-71]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`skills_loader` opens SKILL.md and resource markdown files as UTF-8, but does not handle `UnicodeDecodeError`. A single non-UTF8 file will raise during `.read()` and crash `get_skills_context()`, breaking all tools that inject `skills_context`.
### Issue Context
This affects both `_parse_skill_file()` (SKILL.md) and `_gather_resources()` (inlined `*.md` siblings).
### Fix Focus Areas
- pr_agent/algo/skills_loader.py[127-176]
- pr_agent/algo/skills_loader.py[83-122]
### Expected fix
- Catch `UnicodeDecodeError` in addition to `OSError` when reading SKILL.md and resource files.
- Log a warning and skip the malformed file (return `None` for SKILL.md; continue for resource files).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. All skills always injected 📎 Requirement gap ➹ Performance
Description
get_skills_context() always discovers and injects every skill from skills.paths (subject only to
a token cap), without selecting a relevant subset based on the PR diff/context. This violates the
requirement to activate only relevant skills using the description field as the activation signal,
and can bloat prompts and dilute guidance quality.
Code

pr_agent/algo/skills_loader.py[R254-277]

+def get_skills_context() -> str:
+    """Read settings, discover skills, and format them for prompt injection.
+
+    Returns ``''`` when skills are disabled, no paths are configured, or no
+    skills are found. The only swallowed error is a non-numeric override of
+    ``skills.max_skills_tokens``; everything else surfaces normally so genuine
+    bugs are not masked.
+    """
+    settings = get_settings()
+    if not settings.skills.enabled:
+        return ""
+    paths = list(settings.skills.paths or [])
+    raw_max = settings.skills.max_skills_tokens
+    try:
+        max_tokens = int(raw_max)
+    except (TypeError, ValueError):
+        get_logger().warning(
+            f"Invalid skills.max_skills_tokens={raw_max!r}; falling back to {_DEFAULT_MAX_SKILLS_TOKENS}"
+        )
+        max_tokens = _DEFAULT_MAX_SKILLS_TOKENS
+    skills = discover_skills(paths)
+    if not skills:
+        return ""
+    return format_skills_context(skills, max_tokens)
Evidence
PR Compliance ID 8 requires description-based activation that selects only relevant skills for a
given PR diff/context. The new implementation loads configured skills and formats them wholesale
(discover_skills(paths)format_skills_context(skills, max_tokens)) with no PR-aware filtering
step.

Description-based skill activation against PR diff/context
pr_agent/algo/skills_loader.py[254-277]
pr_agent/algo/skills_loader.py[17-19]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Skills are injected unconditionally (all discovered skills), instead of activating only a relevant subset based on PR diff/context using the skill `description` field.
## Issue Context
Compliance requires description-based activation tied to PR context/diff (e.g., keyword/file-type prefiltering or a selection pass that uses the descriptions to choose a subset).
## Fix Focus Areas
- pr_agent/algo/skills_loader.py[254-277]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Host file exfil via skills ✓ Resolved 🐞 Bug ⛨ Security
Description
discover_skills() expands and walks arbitrary filesystem paths from skills.paths and reads their
contents into the LLM prompt. Because repo settings are merged wholesale without a section
allowlist, a repo can configure [skills] to read sensitive host files (e.g., /etc/*,
$HOME/.ssh/*) and exfiltrate them to the model.
Code

pr_agent/algo/skills_loader.py[R174-188]

+    for raw_path in paths or []:
+        if not isinstance(raw_path, str) or not raw_path.strip():
+            continue
+        expanded = os.path.expanduser(os.path.expandvars(raw_path.strip()))
+        if not os.path.exists(expanded):
+            get_logger().warning(f"Skills path does not exist: {expanded}")
+            continue
+
+        if os.path.isfile(expanded):
+            candidates = [expanded] if os.path.basename(expanded) == "SKILL.md" else []
+        else:
+            candidates = []
+            for root, _dirs, files in os.walk(expanded):
+                if "SKILL.md" in files:
+                    candidates.append(os.path.join(root, "SKILL.md"))
Evidence
The new loader expands env/tilde and traverses the configured path, then reads SKILL.md/resources
from the host filesystem for inclusion in prompts. Repo-level .pr_agent.toml settings are merged
into runtime settings for all sections, so [skills] can be controlled via repo config and thereby
drive these host filesystem reads.

pr_agent/algo/skills_loader.py[174-188]
pr_agent/algo/skills_loader.py[110-160]
pr_agent/git_providers/utils.py[62-72]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`skills.paths` currently allows reading arbitrary host filesystem content into prompts. In multi-repo/self-hosted deployments, repo-controlled settings can set `[skills]` and cause local file exfiltration to the LLM.
## Issue Context
`apply_repo_settings()` merges repo `.pr_agent.toml` sections into runtime settings without an allowlist, so repo settings can affect `[skills]`. The skills loader then expands `$VARS`/`~`, walks the filesystem, and reads files.
## Fix Focus Areas
- pr_agent/git_providers/utils.py[62-72]
- pr_agent/algo/skills_loader.py[174-188]
- pr_agent/settings/configuration.toml[355-367]
## Suggested fix approach
- Add a server-side guard so repo settings cannot override `skills.enabled` and `skills.paths` by default (e.g., an allowlist of repo-overridable sections/keys, or a new `skills.allow_repo_override=false`).
- Optionally enforce that every configured skills path must be under an admin-configured allowlisted base directory (canonicalize with `realpath` and compare prefixes), and reject/ignore paths outside it.
- Consider disabling `expandvars()` for paths that come from repo settings (or entirely), since it increases the blast radius by enabling `$HOME`-style expansion.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Unbounded resource file reads ✓ Resolved 🐞 Bug ☼ Reliability
Description
_gather_resources() reads the full content of every *.md file under each skill directory into
memory before any budget enforcement. Large or numerous markdown files can cause high memory/latency
and even OOM during tool initialization, even if later dropped by format_skills_context().
Code

pr_agent/algo/skills_loader.py[R84-105]

+    for root, dirs, files in os.walk(skill_dir):
+        # Prune executable / binary subtrees per the agent-skills convention.
+        dirs[:] = [d for d in dirs if d not in _EXCLUDED_RESOURCE_DIRS]
+        # A nested skill directory is independent; do not absorb its files.
+        if root != skill_dir and "SKILL.md" in files:
+            dirs[:] = []
+            continue
+        for filename in sorted(files):
+            if not filename.endswith(".md"):
+                continue
+            if root == skill_dir and filename == "SKILL.md":
+                continue
+            full = os.path.join(root, filename)
+            try:
+                with open(full, "r", encoding="utf-8") as fh:
+                    content = fh.read()
+            except OSError as e:
+                get_logger().warning(f"Skill resource unreadable: {full} ({e})")
+                continue
+            rel = os.path.relpath(full, skill_dir)
+            resources.append(SkillResource(relative_path=rel, content=content))
+
Evidence
Resources are eagerly loaded with fh.read() while building each Skill object, and only later
does format_skills_context() enforce a character/token budget when emitting the combined prompt
block. This means the budget does not protect memory/IO cost during discovery/parsing.

pr_agent/algo/skills_loader.py[74-107]
pr_agent/algo/skills_loader.py[154-160]
[pr_age...

Comment thread pr_agent/algo/skills_loader.py Outdated
Comment on lines +254 to +277
def get_skills_context() -> str:
"""Read settings, discover skills, and format them for prompt injection.

Returns ``''`` when skills are disabled, no paths are configured, or no
skills are found. The only swallowed error is a non-numeric override of
``skills.max_skills_tokens``; everything else surfaces normally so genuine
bugs are not masked.
"""
settings = get_settings()
if not settings.skills.enabled:
return ""
paths = list(settings.skills.paths or [])
raw_max = settings.skills.max_skills_tokens
try:
max_tokens = int(raw_max)
except (TypeError, ValueError):
get_logger().warning(
f"Invalid skills.max_skills_tokens={raw_max!r}; falling back to {_DEFAULT_MAX_SKILLS_TOKENS}"
)
max_tokens = _DEFAULT_MAX_SKILLS_TOKENS
skills = discover_skills(paths)
if not skills:
return ""
return format_skills_context(skills, max_tokens)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. All skills always injected 📎 Requirement gap ➹ Performance

get_skills_context() always discovers and injects every skill from skills.paths (subject only to
a token cap), without selecting a relevant subset based on the PR diff/context. This violates the
requirement to activate only relevant skills using the description field as the activation signal,
and can bloat prompts and dilute guidance quality.
Agent Prompt
## Issue description
Skills are injected unconditionally (all discovered skills), instead of activating only a relevant subset based on PR diff/context using the skill `description` field.

## Issue Context
Compliance requires description-based activation tied to PR context/diff (e.g., keyword/file-type prefiltering or a selection pass that uses the descriptions to choose a subset).

## Fix Focus Areas
- pr_agent/algo/skills_loader.py[254-277]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applied — this is a documented architectural limitation of the current PR. PR-Agent dispatches single-shot model calls; there is no tool-use loop in which the model could selectively read SKILL.md based on the description field, so progressive disclosure is not implementable on the current architecture. The PR description's Limitations section calls this out, and the out-of-band design note sketches a two-pass selector (cheap weak-model call routing the diff to relevant skills, then main call with only those bodies) as the natural follow-up — intentionally out of scope here. The token budget (max_skills_tokens, default 8000) caps the inlined block and skills past the cap are dropped with a warning.

Comment thread pr_agent/algo/skills_loader.py
Comment thread pr_agent/algo/skills_loader.py
* Cache get_skills_context() via starlette_context so the three tools
  (review/improve/describe) share one discovery + parse + format per
  request, mirroring the apply_repo_settings cache pattern.
* Replace the 4-chars-per-token heuristic with TokenEncoder and
  clip_tokens (already in pr_agent.algo) so the budget reflects actual
  model tokens.
* Log when remaining skills are dropped after the first-skill
  truncation path (operational visibility).
* Remove unused Skill.path field; pass file_path directly to
  _gather_resources at construction.
* Drop dead inner sorted(files) in _gather_resources -- the final
  resources.sort() is authoritative.
* Fix asymmetric Jinja whitespace strip in pr_reviewer_prompts.toml
  and pr_description_prompts.toml ({% endif %} -> {%- endif %}).
* Trim WHAT-narrating comments per code-review pass.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 11, 2026

Code Review by Qodo

Grey Divider

New Review Started

This review has been superseded by a new analysis

Grey Divider

Qodo Logo

Address Qodo review findings on PR The-PR-Agent#2385:

* Security: apply_repo_settings now refuses to apply the [skills] section
  from a repo's .pr_agent.toml. skills.paths walks the host filesystem
  and inlines file contents into the LLM prompt; without this gate, a
  malicious repo could set paths = ["/etc"] or ["~/.ssh"] in its repo
  settings and exfiltrate sensitive host files. The skills section is
  now host-level configuration only (global settings, env vars, CLI).

* Reliability: per-resource file size cap (256 KB). _gather_resources
  now stats each candidate before opening; oversized files are skipped
  with a warning log rather than read into memory. Defends against
  pathological skill directories or accidental inclusion of generated
  docs that would spike memory during tool init.

* Tests for both behaviours.

The third Qodo finding — that get_skills_context injects every skill
unconditionally rather than selecting a relevant subset based on PR
context — is the documented progressive-disclosure limitation called
out in the PR description's Limitations section and detailed in the
out-of-band design note. A two-pass selector is the architecturally
correct follow-up and is intentionally out of scope here.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 11, 2026

Persistent review updated to latest commit 10f3591

Comment on lines +303 to +317
settings = get_settings()
if not settings.skills.enabled:
_set_cached_context("")
return ""
paths = list(settings.skills.paths or [])
raw_max = settings.skills.max_skills_tokens
try:
max_tokens = int(raw_max)
except (TypeError, ValueError):
get_logger().warning(
f"Invalid skills.max_skills_tokens={raw_max!r}; falling back to {_DEFAULT_MAX_SKILLS_TOKENS}"
)
max_tokens = _DEFAULT_MAX_SKILLS_TOKENS
skills = discover_skills(paths)
out = format_skills_context(skills, max_tokens) if skills else ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. No skills.installed support 📎 Requirement gap ≡ Correctness

Skills discovery only reads settings.skills.paths and the default config defines no
skills.installed list, so centrally installed skill directories cannot be configured/scanned as
required. This breaks the configured discovery contract for agent-skills and makes deployments
unable to separate repo-local vs installed paths.
Agent Prompt
## Issue description
The skills feature does not support the required `[skills].installed` configuration (and only scans `[skills].paths`). This prevents configuring centrally installed skill directories.

## Issue Context
Compliance requires scanning both repo-local paths and installed organization paths.

## Fix Focus Areas
- pr_agent/algo/skills_loader.py[303-317]
- pr_agent/settings/configuration.toml[355-367]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applied — after the host-only gating landed in 10f35914 ([skills] is now refused from repo .pr_agent.toml), the rationale for splitting paths and installed collapses: both lists would come from the same host-level configuration source, so the two-list distinction adds config surface without enabling anything new. Multiple host-level skill directories can still be configured by adding them to paths (it is a list). The original issue #2384 sketched installed as a separate key on the assumption that repo settings could opt-in to skills; that assumption is no longer compatible with the security model on this branch. Happy to revisit if a maintainer signals the two-list shape is required.

Comment on lines +109 to +112
if size > _MAX_RESOURCE_FILE_BYTES:
get_logger().warning(
f"Skill resource skipped (exceeds {_MAX_RESOURCE_FILE_BYTES} bytes): {full} ({size} bytes)"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Over-120 char lines added 📘 Rule violation ⚙ Maintainability

New code introduces long lines that likely violate the repository's Ruff line-length = 120
constraint, which can fail pre-commit/CI linting. These should be wrapped/split to satisfy style
requirements.
Agent Prompt
## Issue description
Some newly added lines exceed the configured Ruff line length (120), which can cause lint failures.

## Issue Context
The repo enforces `line-length = 120` via Ruff configuration.

## Fix Focus Areas
- pr_agent/algo/skills_loader.py[109-112]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applied — false positive. Verified with awk 'length > 120' pr_agent/algo/skills_loader.py (empty) and git diff main...HEAD | grep ^+ | awk 'length > 121' (empty). All lines added by this PR are within the Ruff line-length = 120 constraint. The long lines reported by other tools elsewhere in the repo are pre-existing in files not touched by this diff.

Comment thread pr_agent/algo/skills_loader.py
UnicodeDecodeError is not a subclass of OSError, so a single non-UTF-8
SKILL.md or sibling resource would propagate out of get_skills_context()
and break /review, /improve, /describe whenever skills were enabled.
Both reads now catch (OSError, UnicodeDecodeError); the offending file
is logged and skipped.

Tests cover both SKILL.md and resource paths.

Addresses Qodo finding The-PR-Agent#3 on PR The-PR-Agent#2385.
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 11, 2026

Persistent review updated to latest commit 33f6e27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant