-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix extended thinking and allow per-repo context files (AGENTS.md, CLAUDE.md, etc.) to be included as context #2387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
avidspartan1
wants to merge
9
commits into
The-PR-Agent:main
Choose a base branch
from
avidspartan1:feat/improvements
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
8470f18
feat: allow overriding Claude extended thinking models
avidspartan1 9d41932
feat: include configured repo context in AI prompts
avidspartan1 9c8366a
refactor: simplify GitLab suggestion file lookup
avidspartan1 e6d96b8
chore: address agent comments
avidspartan1 fdba0af
fix: support Gitea repo context files
avidspartan1 f66e2b8
chore: add test for confirming instruction files do not exceed line b…
avidspartan1 eadf5c3
fix: use target pr ref for gitea
avidspartan1 ebe0dd7
fix: bound repo context cache
avidspartan1 817fa77
refactor: split repo context responsibilities
avidspartan1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,265 @@ | ||
| import time | ||
| from collections import OrderedDict | ||
| from html import escape | ||
|
|
||
| from pr_agent.config_loader import get_settings | ||
| from pr_agent.git_providers.git_provider import GitProvider | ||
| from pr_agent.log import get_logger | ||
|
|
||
| TRUNCATION_MARKER = "...(truncated)..." | ||
| INSTRUCTION_FILES_INTRO = ( | ||
| "You are being given instruction files. Follow them as project-specific guidance when reviewing code." | ||
| ) | ||
| MARKDOWN_FENCE = "`````" | ||
| REPO_CONTEXT_CACHE_ATTRIBUTE = "_repo_context_cache" | ||
| REPO_CONTEXT_CACHE_MAX_SIZE = 256 | ||
| REPO_CONTEXT_CACHE_TTL_SECONDS = 15 * 60 | ||
| _REPO_CONTEXT_CACHE_MISS = object() | ||
| _unsupported_repo_context_provider_classes = set() | ||
|
|
||
|
|
||
| class _RepoContextCache: | ||
| def __init__(self, max_size: int = REPO_CONTEXT_CACHE_MAX_SIZE, ttl_seconds: int = REPO_CONTEXT_CACHE_TTL_SECONDS): | ||
| self._max_size = max(1, int(max_size)) | ||
| self._ttl_seconds = max(0, int(ttl_seconds)) | ||
| self._entries = OrderedDict() | ||
|
|
||
| def copy(self): | ||
| cache = type(self)(max_size=self._max_size, ttl_seconds=self._ttl_seconds) | ||
| cache._entries = self._entries.copy() | ||
| return cache | ||
|
|
||
| def get(self, key, default=None): | ||
| entry = self._entries.get(key) | ||
| if entry is None: | ||
| return default | ||
|
|
||
| value, expires_at = entry | ||
| if expires_at <= time.monotonic(): | ||
| del self._entries[key] | ||
| return default | ||
|
|
||
| self._entries.move_to_end(key) | ||
| return value | ||
|
|
||
| def __setitem__(self, key, value): | ||
| self._entries[key] = (value, time.monotonic() + self._ttl_seconds) | ||
| self._entries.move_to_end(key) | ||
| while len(self._entries) > self._max_size: | ||
| self._entries.popitem(last=False) | ||
|
|
||
|
|
||
| _repo_context_process_cache = _RepoContextCache() | ||
|
|
||
|
|
||
| def _get_markdown_fence(content: str) -> str: | ||
| fence = MARKDOWN_FENCE | ||
| while fence in content: | ||
| fence += "`" | ||
| return fence | ||
|
|
||
|
|
||
| def _get_repo_context_cache_key(context_files: list, max_lines: int) -> tuple[tuple[tuple[str, str], ...], int]: | ||
| return tuple((type(file_path).__name__, str(file_path)) for file_path in context_files), max_lines | ||
|
|
||
|
|
||
| def _get_repo_context_process_cache_key(git_provider, context_files: list, max_lines: int) -> tuple | None: | ||
| try: | ||
| pr_url = git_provider.get_pr_url() | ||
| except Exception: | ||
| pr_url = getattr(git_provider, "pr_url", None) | ||
|
|
||
| if not pr_url: | ||
| return None | ||
|
|
||
| return type(git_provider).__name__, pr_url, _get_repo_context_cache_key(context_files, max_lines) | ||
|
|
||
|
|
||
| def _get_repo_context_config() -> tuple[list, int] | None: | ||
| context_files = get_settings().config.get("repo_context_files", []) | ||
| if not context_files: | ||
| return None | ||
|
|
||
| if isinstance(context_files, str): | ||
| get_logger().warning( | ||
| "repo_context_files should be a list of file paths; treating string value as one file path", | ||
| artifact={"repo_context_files": context_files}, | ||
| ) | ||
| context_files = [context_files] | ||
| elif not isinstance(context_files, list): | ||
| get_logger().warning( | ||
| "repo_context_files should be a list of file paths; skipping repo context", | ||
| artifact={"repo_context_files": context_files}, | ||
| ) | ||
| return None | ||
|
|
||
| max_lines = get_settings().config.get("repo_context_max_lines", 500) | ||
| try: | ||
| max_lines = max(0, int(max_lines)) | ||
| except (TypeError, ValueError): | ||
| max_lines = 500 | ||
|
|
||
| return context_files, max_lines | ||
|
|
||
|
|
||
| def _provider_supports_repo_context(git_provider) -> bool: | ||
| provider_class = type(git_provider) | ||
| if provider_class.get_repo_file_content is not GitProvider.get_repo_file_content: | ||
| return True | ||
|
|
||
| if provider_class not in _unsupported_repo_context_provider_classes: | ||
| _unsupported_repo_context_provider_classes.add(provider_class) | ||
| get_logger().warning( | ||
| f"repo_context_files is configured, but {provider_class.__name__} does not support repository " | ||
| "file fetching; skipping repo context" | ||
| ) | ||
| return False | ||
|
|
||
|
|
||
| def _get_provider_repo_context_cache(git_provider) -> _RepoContextCache: | ||
| repo_context_cache = getattr(git_provider, REPO_CONTEXT_CACHE_ATTRIBUTE, None) | ||
| if repo_context_cache is None or not isinstance(repo_context_cache, _RepoContextCache): | ||
| repo_context_cache = _RepoContextCache() | ||
| setattr(git_provider, REPO_CONTEXT_CACHE_ATTRIBUTE, repo_context_cache) | ||
| return repo_context_cache | ||
|
|
||
|
|
||
| def _get_cached_repo_context(git_provider, context_files: list, max_lines: int): | ||
| process_cache_key = _get_repo_context_process_cache_key(git_provider, context_files, max_lines) | ||
| if process_cache_key is not None: | ||
| cached_repo_context = _repo_context_process_cache.get(process_cache_key, _REPO_CONTEXT_CACHE_MISS) | ||
| if cached_repo_context is not _REPO_CONTEXT_CACHE_MISS: | ||
| return cached_repo_context | ||
|
|
||
| cache_key = _get_repo_context_cache_key(context_files, max_lines) | ||
| cached_repo_context = _get_provider_repo_context_cache(git_provider).get(cache_key, _REPO_CONTEXT_CACHE_MISS) | ||
| if cached_repo_context is not _REPO_CONTEXT_CACHE_MISS: | ||
| return cached_repo_context | ||
|
|
||
| return _REPO_CONTEXT_CACHE_MISS | ||
|
|
||
|
|
||
| def _store_repo_context(git_provider, context_files: list, max_lines: int, repo_context: str) -> None: | ||
| cache_key = _get_repo_context_cache_key(context_files, max_lines) | ||
| _get_provider_repo_context_cache(git_provider)[cache_key] = repo_context | ||
|
|
||
| process_cache_key = _get_repo_context_process_cache_key(git_provider, context_files, max_lines) | ||
| if process_cache_key: | ||
| _repo_context_process_cache[process_cache_key] = repo_context | ||
|
|
||
|
|
||
| def _load_repo_context_files(git_provider, context_files: list) -> tuple[dict[str, str], bool]: | ||
| files = {} | ||
| had_fetch_error = False | ||
| for file_path in context_files: | ||
| if not isinstance(file_path, str) or not file_path.strip(): | ||
| get_logger().warning("Skipping invalid repo context file path", artifact={"file_path": file_path}) | ||
| continue | ||
|
|
||
| file_path = file_path.strip() | ||
| try: | ||
| content = git_provider.get_repo_file_content(file_path) | ||
| except Exception as e: | ||
| had_fetch_error = True | ||
| get_logger().warning(f"Failed to load repo context file: {file_path}", artifact={"error": str(e)}) | ||
| continue | ||
|
|
||
| if not content: | ||
| get_logger().debug(f"Repo context file is empty or missing: {file_path}") | ||
| continue | ||
|
|
||
| if isinstance(content, bytes): | ||
| content = content.decode("utf-8", errors="replace") | ||
|
|
||
| files[file_path] = str(content).rstrip() | ||
|
|
||
| return files, had_fetch_error | ||
|
|
||
|
|
||
| def render_instruction_files(files: dict[str, str]) -> str: | ||
| parts = [ | ||
| INSTRUCTION_FILES_INTRO, | ||
| "<instruction_files>", | ||
| ] | ||
|
|
||
| for path, content in files.items(): | ||
| scope = path.rsplit("/", 1)[0] if "/" in path else "repo-root" | ||
| fence = _get_markdown_fence(content) | ||
| parts.append(f'<file path="{escape(path, quote=True)}" scope="{escape(scope, quote=True)}">') | ||
| parts.append(f"{fence}markdown") | ||
| parts.append(content.rstrip()) | ||
| parts.append(fence) | ||
| parts.append("</file>") | ||
| parts.append("") | ||
|
|
||
| parts.append("</instruction_files>") | ||
| return "\n".join(parts) | ||
|
|
||
|
|
||
| def render_instruction_files_with_line_budget(files: dict[str, str], max_lines: int) -> str: | ||
| parts = [ | ||
| INSTRUCTION_FILES_INTRO, | ||
| "<instruction_files>", | ||
| ] | ||
| closing_tag = "</instruction_files>" | ||
| if max_lines < len(parts) + 1: | ||
| return "" | ||
|
|
||
| for path, content in files.items(): | ||
| scope = path.rsplit("/", 1)[0] if "/" in path else "repo-root" | ||
| fence = _get_markdown_fence(content) | ||
| file_header = [ | ||
| f'<file path="{escape(path, quote=True)}" scope="{escape(scope, quote=True)}">', | ||
| f"{fence}markdown", | ||
| ] | ||
| file_footer = [ | ||
| fence, | ||
| "</file>", | ||
| "", | ||
| ] | ||
| content_lines = content.rstrip().splitlines() | ||
| reserved_file_and_closing_lines = len(file_header) + len(file_footer) + 1 | ||
| available_content_lines = max_lines - len(parts) - reserved_file_and_closing_lines | ||
| if available_content_lines < 0 or (content_lines and available_content_lines < 1): | ||
| break | ||
|
|
||
| parts.extend(file_header) | ||
| if available_content_lines >= len(content_lines): | ||
| parts.extend(content_lines) | ||
| else: | ||
| if available_content_lines > 1: | ||
| parts.extend(content_lines[: available_content_lines - 1]) | ||
| parts.append(TRUNCATION_MARKER) | ||
| parts.extend(file_footer) | ||
| break | ||
|
|
||
| parts.extend(file_footer) | ||
|
|
||
| parts.append(closing_tag) | ||
| return "\n".join(parts).strip() | ||
|
|
||
|
|
||
| def build_repo_context(git_provider) -> str: | ||
| repo_context_config = _get_repo_context_config() | ||
| if repo_context_config is None: | ||
| return "" | ||
|
|
||
| context_files, max_lines = repo_context_config | ||
| if not _provider_supports_repo_context(git_provider): | ||
| return "" | ||
|
|
||
| cached_repo_context = _get_cached_repo_context(git_provider, context_files, max_lines) | ||
| if cached_repo_context is not _REPO_CONTEXT_CACHE_MISS: | ||
| return cached_repo_context | ||
|
|
||
| files, had_fetch_error = _load_repo_context_files(git_provider, context_files) | ||
| if not files and had_fetch_error: | ||
| return "" | ||
|
|
||
| if not files: | ||
| _store_repo_context(git_provider, context_files, max_lines, "") | ||
| return "" | ||
|
|
||
| repo_context = render_instruction_files_with_line_budget(files, max_lines) | ||
| _store_repo_context(git_provider, context_files, max_lines, repo_context) | ||
| return repo_context | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Long type-hinted signatures
📘 Rule violation⚙ MaintainabilityAgent Prompt
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools