Skip to content

lsdefine/GenericAgent

Repository files navigation

GenericAgent Banner

GenericAgent

A Minimal, Self-Evolving Autonomous Agent Framework

~3K lines of seed code · 9 atomic tools · ~100-line Agent Loop

Technical Report Reproduction Repo Tutorial Sophub

Trendshift

English · 中文

📌 Official Channel — This GitHub repository is the only official source of GenericAgent. We have no affiliation with any third-party website using the GenericAgent name.


🌟 Overview

GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3K lines of code. Through 9 atomic tools + a ~100-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).

Design philosophy — don't preload skills, evolve them.

Every time GenericAgent solves a new task, it automatically crystallizes the execution path into a reusable Skill. The longer you use it, the more skills accumulate — forming a personal skill tree grown entirely from 3K lines of seed code.

🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running git init to every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.

📑 Table of Contents


📋 Key Features

Feature Description
🧬 Self-Evolving Automatically crystallizes each task into a Skill. Capabilities grow with every use, forming your personal skill tree.
🪶 Minimal Architecture ~3K lines of core code. Agent Loop is ~100 lines. No complex dependencies, zero deployment overhead.
Strong Execution Injects into a real browser (preserving login sessions). 9 atomic tools take direct control of the system.
🔌 High Compatibility Supports Claude / Gemini / Kimi / MiniMax and other major models. Cross-platform.
💰 Token Efficient <30K context window — a fraction of the 200K–1M other agents consume. Less noise, fewer hallucinations, higher success rate, lower cost.

🎯 Demo Showcase

🧋 Food Delivery Order 📈 Quantitative Stock Screening
Order Tea Stock Selection
"Order me a milk tea" — navigates the delivery app, selects items, completes checkout. "Find GEM stocks with EXPMA golden cross, turnover > 5%" — quantitative screening.
🌐 Autonomous Web Exploration 💰 Expense Tracking
Web Exploration Alipay Expense
Autonomously browses and periodically summarizes web content. "Find expenses over ¥2K in the last 3 months" — drives Alipay via ADB.
💬 Batch Messaging
WeChat Batch
Sends bulk WeChat messages, fully driving the WeChat client.

🚀 Quick Start

⚠️ Python version: use Python 3.11 or 3.12. Do not use Python 3.14 — it is incompatible with pywebview and a few other GA dependencies.

📖 Detailed installation guide: installation.md · installation_zh.md(中文)

For LLM Agents

Fetch the installation guide and follow it:

curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/refs/heads/main/docs/installation.md

For Humans

Method 1 — One-line install (recommended)

This installs GenericAgent with an isolated Python environment and Git, then downloads a ready-to-run package.

Windows PowerShell

powershell -ExecutionPolicy Bypass -c "$env:GLOBAL=1; irm http://fudankw.cn:9000/files/ga_install.ps1 | iex"

Linux / macOS

GLOBAL=1 bash -c "$(curl -fsSL http://fudankw.cn:9000/files/ga_install.sh)"

After installation, launch the desktop app from:

frontends/GenericAgent.exe

Method 2 — Python install (for developers)

git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
uv venv
uv pip install -e ".[ui]"          # Core + UI dependencies
cp mykey_template.py mykey.py      # Fill in your LLM API key
python launch.pyw

💡 GenericAgent is meant to grow its environment through the Agent itself, not by pre-installing every possible package.

📖 Full guide: docs/GETTING_STARTED.md


💻 Usage

Frontends

Desktop App

For one-line installs on Windows, double-click:

frontends/GenericAgent.exe

Terminal UI

A lightweight, keyboard-driven interface built on Textual. Supports multiple concurrent sessions and real-time streaming.

python frontends/tuiapp_v2.py
⚠️ Windows TUI Troubleshooting

TUI rendering on Windows can be flaky depending on terminal + font. Common causes:

  1. textual is not on the latest version — pip install -U textual first.
  2. PowerShell / cmd ship with terminals that have rough Unicode + key-binding support. Prefer Git Bash on Windows, which is much better behaved.
  3. If it still looks broken, ask GA itself to fix it:

    "My experience using frontends/tuiapp_v2.py in PowerShell / cmd / Git Bash on Windows is very poor — lots of incompatibility. Please refer to Claude Code's best practices for the Windows terminal and fix all font and rendering incompatibilities."

Streamlit UI

python launch.pyw

Bot Interface (IM)

GenericAgent also supports IM frontends such as Telegram, WeChat, QQ, Feishu / Lark, WeCom, and DingTalk.

Platform Command
Telegram python frontends/tgapp.py
WeChat python frontends/wechatapp.py
QQ python frontends/qqapp.py
Feishu / Lark python frontends/fsapp.py
WeCom python frontends/wecomapp.py
DingTalk python frontends/dingtalkapp.py

For detailed setup, ask GenericAgent itself.

Common Chat Commands

Command Description
/new Start a fresh conversation and clear the current context
/continue List recoverable conversation snapshots
/continue N Restore the N-th recoverable conversation

🧠 Architecture

GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.

1️⃣ Layered Memory System

Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.

Layer Name Description
L0 Meta Rules Core behavioral rules and system constraints
L1 Insight Index Minimal memory index for fast routing and recall
L2 Global Facts Stable knowledge accumulated over long-term operation
L3 Task Skills / SOPs Reusable workflows for completing specific task types
L4 Session Archive Archived task records distilled from finished sessions for long-horizon recall

2️⃣ Autonomous Execution Loop

Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop

The entire core loop is just ~100 lines of code (agent_loop.py).

3️⃣ Minimal Toolset

GenericAgent provides only 9 atomic tools, forming the foundational capabilities for interacting with the outside world.

Tool Function
code_run Execute arbitrary code (Python / PowerShell)
file_read Read files
file_write Write / create / overwrite files
file_patch Patch / modify files
web_scan Perceive web content
web_execute_js Control browser behavior
ask_user Human-in-the-loop confirmation
update_working_checkpoint (memory) Short-term working notepad
start_long_term_update (memory) Distill long-term memory

4️⃣ Capability Extension

Capable of dynamically creating new tools.

Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.

GenericAgent Workflow
GenericAgent Workflow Diagram

🧬 Self-Evolution Mechanism

This is what fundamentally distinguishes GenericAgent from every other agent framework.

[New Task]
   │
   ▼
[Autonomous Exploration]   ─►  install deps · write scripts · debug · verify
   │
   ▼
[Crystallize into Skill]   ─►  write to memory layer
   │
   ▼
[Direct Recall on Next Similar Task]
What you say First time Every time after
"Read my WeChat messages" Install deps → reverse DB → write read script → save Skill one-line invoke
"Monitor stocks and alert me" Install mootdx → build selection flow → configure cron → save Skill one-line start
"Send this file via Gmail" Configure OAuth → write send script → save Skill ready to use

After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code.


📊 Comparison

Feature GenericAgent OpenClaw Claude Code
Codebase ~3K lines ~530,000 lines Open-sourced (large)
Deployment pip install + API Key Multi-service orchestration CLI + subscription
Browser Control Real browser (session preserved) Sandbox / headless browser Via MCP plugin
OS Control Mouse/kbd, vision, ADB Multi-agent delegation File + terminal
Self-Evolution Autonomous skill growth Plugin ecosystem Stateless between sessions
Out of the Box Few core files + starter skills Hundreds of modules Rich CLI toolset

📈 Evaluation

📂 Full evaluation datasets and results: JinyiHan99/GA-Technical-Report

We evaluate GenericAgent across five dimensions:

# Dimension Question Benchmarks
1 Task Completion & Token Efficiency Can GA complete hard tasks more cheaply than leading agents? SOP-Bench, Lifelong AgentBench, RealFin-Benchmark
2 Tool-Use Efficiency Can a minimal atomic toolset solve what specialized toolsets solve, with less overhead? Tool Efficiency Benchmark (11 simple + 5 long-horizon)
3 Memory System Effectiveness Does condensed hierarchical memory beat full/redundant memory and embedding-based retrievers? SOP-Bench (dangerous goods), LoCoMo, 20-skill stress test
4 Self-Evolution Capability Can the agent distill experience into reusable SOPs and code, without intervention? 9-round LangChain longitudinal study, 8-task cross-task web benchmark
5 Web Browsing Capability Does density-driven design survive the open web? WebCanvas, BrowseComp-ZH, Custom Tasks (22)

Baselines across these dimensions include Claude Code, OpenAI CodeX, and OpenClaw, evaluated under Claude Sonnet 4.6, Claude Opus 4.6, GPT-5.4, and MiniMax M2.7 backbones.

Tool-use efficiency radar
Tool-use efficiency radar. GA dominates token, request, and tool-call axes while preserving quality across four task dimensions.
Cross-task self-evolution convergence
Cross-task self-evolution. Second- and third-run GA executions converge to a stable low-cost regime across eight web tasks, while OpenClaw shows no such convergence.

📅 Roadmap & News

  • 2026-05-15 — 🖥️ Desktop GUI released. One-line installs ship a ready-to-run desktop app (frontends/GenericAgent.exe). Developers launch via python launch.pyw.
  • 2026-05-14 — 🆕 Conductor sub-agent orchestration. Spawn, supervise, and auto-clean parallel sub-agents; first-class delegation primitives complementing /btw side-questions.
  • 2026-05-12 — 🆕 TUI v2 released (frontends/tuiapp_v2.py). Refined Textual frontend with image-paste folding, file paste, block-delete, Ctrl+C copy, history navigation, and /llm / /export / /continue pickers.
  • 2026-04-21 — 📄 Technical Report on arXivGenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization.
  • 2026-04-11 — Introduced L4 session archive memory and scheduler cron integration.
  • 2026-03-23 — Personal WeChat supported as a bot frontend.
  • 2026-03-10Released million-scale Skill Library.
  • 2026-03-08Released "Dintal Claw" — a GenericAgent-powered government-affairs bot.
  • 2026-03-01Featured by Jiqizhixin (机器之心).
  • 2026-01-16 — GenericAgent V1.0 public release.

⭐ Community & Support

If this project helped you, please consider leaving a Star! 🙏

You're also welcome to join the GenericAgent Community Group for discussion, feedback, and co-building 👏

WeChat Group 18
WeChat Group 18 QR

🚩 Friendly Links

Thanks to the LinuxDo community for the support!

LinuxDo

Community GUIs (independent open-source projects):


📄 License

Distributed under the MIT License. See LICENSE for full text.

Disclaimer: This project does not build or operate any commercial website. Apart from DintalClaw, no institution, organization, or individual is currently officially authorized to conduct commercial activities under the GenericAgent name.


🌟 项目简介

GenericAgent 是一个极简、可自我进化的自主 Agent 框架。核心仅 ~3K 行代码,通过 9 个原子工具 + ~100 行 Agent Loop,赋予任意 LLM 对本地计算机的系统级控制能力,覆盖浏览器、终端、文件系统、键鼠输入、屏幕视觉及移动设备(ADB)。

设计哲学 —— 不预设技能,靠进化获得能力。

每解决一个新任务,GenericAgent 就将执行路径自动固化为 Skill,供后续直接调用。使用时间越长,沉淀的技能越多,形成一棵完全属于你、从 3K 行种子代码生长出来的专属技能树。

🤖 自举实证 — 本仓库的一切,从安装 Git、git init 到每一条 commit message,均由 GenericAgent 自主完成。作者全程未打开过一次终端。

📑 目录


📋 核心特性

特性 说明
🧬 自我进化 每次任务自动沉淀 Skill,能力随使用持续增长,形成专属技能树
🪶 极简架构 ~3K 行核心代码,Agent Loop 约百行,无复杂依赖,部署零负担
强执行力 注入真实浏览器(保留登录态),9 个原子工具直接接管系统
🔌 高兼容性 支持 Claude / Gemini / Kimi / MiniMax 等主流模型,跨平台运行
💰 极致省 Token 上下文窗口不到 30K,是其他 Agent(200K–1M)的零头;噪声更少、幻觉更低、成功率更高,成本低一个数量级

🎯 实例展示

🧋 外卖下单 📈 量化选股
外卖下单 量化选股
"Order me a milk tea" — 自动导航外卖 App,选品并完成结账 "Find GEM stocks with EXPMA golden cross, turnover > 5%" — 量化条件筛股
🌐 自主网页探索 💰 支出追踪
网页探索 支付宝支出
自主浏览并定时汇总网页信息 "查找近 3 个月超 ¥2K 的支出" — 通过 ADB 驱动支付宝
💬 批量消息
微信批量
批量发送微信消息,完整驱动微信客户端

🚀 快速开始

⚠️ Python 版本: 推荐使用 Python 3.11 或 3.12请不要使用 Python 3.14,与 pywebview 及部分依赖不兼容。

📖 详细安装指南:installation_zh.md(中文) · installation.md (English)

给 LLM Agent 看的

获取安装指南并照做:

curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/refs/heads/main/docs/installation_zh.md

给人类用户看的

方法一 — 一键安装 (推荐)

一键安装会自动准备独立 Python 环境、Git、项目文件和桌面端,不污染系统环境。

Windows PowerShell

powershell -ExecutionPolicy Bypass -c "irm http://fudankw.cn:9000/files/ga_install.ps1 | iex"

Linux / macOS

curl -fsSL http://fudankw.cn:9000/files/ga_install.sh | bash

安装完成后,双击启动:

frontends/GenericAgent.exe

方法二 — Python 安装 (开发者)

git clone https://github.com/lsdefine/GenericAgent.git
cd GenericAgent
uv venv
uv pip install -e ".[ui]"          # 核心 + UI 依赖
cp mykey_template.py mykey.py      # 填入你的 LLM API Key
python launch.pyw

💡 GenericAgent 更推荐由 Agent 在使用中自举环境,而不是预先手动装完整依赖。

📖 完整引导流程见 docs/GETTING_STARTED.md 📖 新手图文版:飞书文档 📘 完整入门教程(Datawhale 出品):Hello GenericAgent · GitHub


💻 使用方式

前端启动

桌面端

一键安装自带桌面端,双击:

frontends/GenericAgent.exe

终端 UI

基于 Textual 的轻量键盘驱动界面。支持多会话并发、实时流式输出,有终端就能跑。

python frontends/tuiapp_v2.py
⚠️ Windows 上 TUI 显示异常的排查思路
  1. textual 版本太旧,先 pip install -U textual
  2. PowerShell / cmd 自带终端对 Unicode 和键位的支持比较糟糕,Windows 上推荐用 Git Bash,体验明显更稳;
  3. 仍然显示异常时,可以让 GA 自己修一遍,参考 Prompt:

    "我在 Windows 的 PowerShell / cmd / Git Bash 中使用 frontends/tuiapp_v2.py 体验非常差,出现了一堆不兼容问题。请参考 Claude Code 在 Windows 终端的最佳配置,把所有字体和显示不兼容的问题修一遍。"

Streamlit UI

python launch.pyw

Bot 接口(IM)

GenericAgent 支持 Telegram、微信、QQ、飞书 / Lark、企业微信、钉钉等 IM 前端。

平台 启动命令
Telegram python frontends/tgapp.py
微信 python frontends/wechatapp.py
QQ python frontends/qqapp.py
飞书 / Lark python frontends/fsapp.py
企业微信 python frontends/wecomapp.py
钉钉 python frontends/dingtalkapp.py

详细配置直接问 GenericAgent。

通用聊天命令

命令 说明
/new 开启新对话并清空当前上下文
/continue 列出可恢复会话快照
/continue N 恢复第 N 个可恢复会话

🧠 架构设计

GenericAgent 通过 分层记忆 × 最小工具集 × 自主执行循环 完成复杂任务,并在执行过程中持续积累经验。

1️⃣ 分层记忆系统

记忆在任务执行过程中持续沉淀,使 Agent 逐步形成稳定且高效的工作方式。

层级 名称 说明
L0 元规则(Meta Rules) Agent 的基础行为规则和系统约束
L1 记忆索引(Insight Index) 极简索引层,用于快速路由与召回
L2 全局事实(Global Facts) 在长期运行过程中积累的稳定知识
L3 任务 Skills / SOPs 完成特定任务类型的可复用流程
L4 会话归档(Session Archive) 从已完成任务中提炼出的归档记录,用于长程召回

2️⃣ 自主执行循环

感知环境状态 → 任务推理 → 调用工具执行 → 经验写入记忆 → 循环

整个核心循环仅 约百行代码agent_loop.py)。

3️⃣ 最小工具集

GenericAgent 仅提供 9 个原子工具,构成与外部世界交互的基础能力。

工具 功能
code_run 执行任意代码(Python / PowerShell)
file_read 读取文件
file_write 写入 / 创建 / 覆盖文件
file_patch 修改文件
web_scan 感知网页内容
web_execute_js 控制浏览器行为
ask_user 人机协作确认
update_working_checkpoint (记忆) 短期工作记事板
start_long_term_update (记忆) 提炼长期记忆

4️⃣ 能力扩展机制

具备动态创建新工具的能力。

通过 code_run,GenericAgent 可在运行时动态安装 Python 包、编写新脚本、调用外部 API 或控制硬件,将临时能力固化为永久工具。

GenericAgent 工作流程
GenericAgent 工作流程图

🧬 自我进化机制

这是 GenericAgent 区别于其他 Agent 框架的根本所在。

[遇到新任务]
    │
    ▼
[自主摸索]   ─►  安装依赖 · 编写脚本 · 调试验证
    │
    ▼
[执行路径固化为 Skill]   ─►  写入记忆层
    │
    ▼
[下次同类任务直接调用]
你说的一句话 第一次做了什么 之后每次
"监控股票并提醒我" 安装 mootdx → 构建选股流程 → 配置定时任务 → 保存 Skill 一句话启动
"用 Gmail 发这个文件" 配置 OAuth → 编写发送脚本 → 保存 Skill 直接可用

用几周后,你的 Agent 实例将拥有一套任何人都没有的专属技能树,全部从 3K 行种子代码中生长而来。


📊 与同类产品对比

特性 GenericAgent OpenClaw Claude Code
代码量 ~3K 行 ~530,000 行 已开源(体量大)
部署方式 pip install + API Key 多服务编排 CLI + 订阅
浏览器控制 注入真实浏览器(保留登录态) 沙箱 / 无头浏览器 通过 MCP 插件
OS 控制 键鼠、视觉、ADB 多 Agent 委派 文件 + 终端
自我进化 自主生长 Skill 和工具 插件生态 会话间无状态
出厂配置 几个核心文件 + 少量初始 Skills 数百模块 丰富 CLI 工具集

📈 评测

📂 完整的评测数据集以及评测结果见:JinyiHan99/GA-Technical-Report

我们从 五大维度 评测 GenericAgent:

# 维度 核心问题 使用的基准
1 任务完成度与 Token 效率 GA 能否以更低成本完成高难度任务? SOP-Bench、Lifelong AgentBench、RealFin-Benchmark
2 工具使用效率 最小原子工具集能否以更低开销替代专用工具集? Tool Efficiency Benchmark
3 记忆系统有效性 精简分层记忆能否超越冗余记忆和基于 Embedding 的检索器? SOP-Bench、LoCoMo、20-skill 压力测试
4 自我进化能力 Agent 能否在无人干预下将经验提炼为可复用的 SOP 与代码? 9 轮 LangChain 纵向研究、8 任务跨任务 Web 基准
5 网页浏览能力 信息密度驱动设计能否适应开放网页? WebCanvas、BrowseComp-ZH、自定义任务

以上维度的基线包括 Claude CodeOpenAI CodeXOpenClaw,分别在 Claude Sonnet 4.6Claude Opus 4.6GPT-5.4MiniMax M2.7 底座上进行评测。

工具使用效率雷达图
工具使用效率雷达图。GA 在 Token、请求数和工具调用轴上全面领先,同时在四个任务维度上保持质量。
跨任务自我进化收敛曲线
跨任务自我进化。GA 的第二轮和第三轮执行在 8 个 Web 任务上收敛至稳定的低成本区间。

📅 路线图与最新动态

  • 2026-05-15 — 🖥️ 桌面 GUI 发布。一键安装会自带可直接运行的桌面端(frontends/GenericAgent.exe),开发者也可用 python launch.pyw 启动。
  • 2026-05-14 — 🆕 Conductor 子 Agent 编排。派发、监督、自动清理并行子 Agent;与 /btw 旁路子 Agent 互补,提供一等公民级的任务委派原语。
  • 2026-05-12 — 🆕 TUI v2 正式发布frontends/tuiapp_v2.py)。重做视觉风格的 Textual 前端,支持图片粘贴折叠、文件粘贴、块删除、Ctrl+C 复制、历史导航,以及 /llm / /export / /continue 选择器。
  • 2026-04-21 — 📄 技术报告已发布至 arXivGenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization
  • 2026-04-11 — 引入 L4 会话归档记忆,并接入 scheduler cron 调度。
  • 2026-03-23 — 支持个人微信接入作为 Bot 前端。
  • 2026-03-10发布百万级 Skill 库
  • 2026-03-08发布以 GenericAgent 为核心的"政务龙虾" Dintal Claw
  • 2026-03-01被机器之心报道
  • 2026-01-16 — GenericAgent V1.0 公开版本发布。

⭐ 社区与支持

如果这个项目对你有帮助,欢迎点一个 Star! 🙏

也欢迎加入 GenericAgent 体验交流群,一起交流、反馈、共建 👏

微信群 18
微信群 18 二维码

🚩 友情链接

感谢 LinuxDo 社区的支持!

LinuxDo

社区 GUI 客户端 (独立开源项目)


📄 许可

基于 MIT License 发布,详见 LICENSE

声明:本项目未构建任何商业站点;除 DintalClaw 外,目前未官方授权任何机构、组织或个人以 GenericAgent 名义从事商业活动。


📈 Star History