A powerful command-line tool that uses Claude AI to translate markdown and MDX files from English to any specified language while preserving formatting and structure.
This code and most of the README are from the team at PlayCanvas. The only changes are:
- StarRocks specific prompt
- StarRocks specific dictionary
- StarRocks specific words that should always be in English
- the
-s, --sourceoption to allow specifying the source language as we translate from both English and Chinese.
-i, --input <pattern> Input file path or glob pattern (e.g., "*.md",
"docs/**/*.md")
-l, --language <lang> Target language (e.g., Spanish, French, German)
-s, --source <lang> Source language (default: English)
-o, --output <file> Output file path (for single file translation)
-d, --output-dir <dir> Output directory (for batch translation or single
file)
-k, --key <apikey> Anthropic API key (or set ANTHROPIC_API_KEY env var)
--flat Use flat structure in output directory (default:
preserve structure)
--suffix <suffix> Custom suffix for output files (default: language
name)
--log-chunk-metadata Log API metadata for each chunk
--trace Log per-ID source text sent and translated text
received (full content, no truncation)
-h, --help display help for command
The translator now uses the AST pipeline by default.
When --trace is enabled, the tool logs one JSON trace record per ID and includes the full sourceText and translatedText values. The only masking applied is replacing occurrences of the actual API key value with ***.
In AST mode, each chunk asks the model to return a strict JSON array of { id, text } items.
- Parse errors such as
Expected ',' or '}'orExpected ':' after property nameusually mean the model returned malformed JSON for that chunk. - These are response-format failures, not semantic translation failures.
finishReason: STOPwith parse errors means the output completed, but the JSON structure was invalid.- When you see
json repair retry, the tool requested a strict JSON retry and recovered automatically. - When you see
split fallback recovered X/Y missing ids, the tool retried unresolved IDs in smaller sub-batches and merged recovered results back into the chunk.
How to read the outcome:
AST completeness check: Translated IDs N/N - ✅ PASSmeans the chunk is fully recovered, even if repair notes are present.- Missing IDs after all retries are the only case that indicates unresolved chunk-level translation for those specific items.
- cd into the root of this repo
- Get an Anthropic API Key
- Export your Anthropic API Key like so:
export ANTHROPIC_API_KEY="<your key here>"
- Install the prerequisites:
npm install
- Translate an example file:
npm run demo
- Check the source and destination example files (names are in the output from
npm run demo). Look for our key phrases that are in our dictionaries and the terms that should always be left in English. - List the options:
node bin/cli.js translate -h
# Export your Anthropic API key
export ANTHROPIC_API_KEY="sk-ant-xxxxxxxxxxxxxxxx"
# in the doc-translator repo directory install the translator globally on your system:
npm install
npm link
# now in the starrocks/starrocks repo dir
# view the options:
doc-translate translate -h
# Example, translate the English architecture doc to Japanese:
doc-translate translate -s en -i docs/en/introduction/Architecture.md -l ja -o docs/ja/introduction/Architecture.md- 🌍 Multi-language support - Translate to 40+ languages
- 📝 Markdown-aware - Preserves all markdown formatting (headers, links, code blocks, tables, etc.)
- 🔄 Smart chunking - Handles large files by splitting content intelligently
- 🎯 Selective translation - Only translates text content, keeps code and URLs intact
- 📂 Batch processing - Translate multiple files using glob patterns (e.g.,
docs/**/*.md) - 🏗️ Structure preservation - Maintain directory structure or flatten output as needed
- 📊 Progress tracking - Real-time progress indication with spinners for single files and batches
- 🎨 Beautiful CLI - Colorful, user-friendly command-line interface
- ⚡ Fast processing - Optimized for speed with high-performance Claude model
- Node.js 16.0.0 or higher
- Anthropic API key (Get one here)
Note: This tool uses ES modules (ESM) and requires Node.js 16+ for full compatibility.
npm installnpm linkOr run directly with Node:
node bin/cli.js- Visit Anthropic Console
- Create a new API key
- Copy the generated key
Option A: Environment Variable (Recommended)
export ANTHROPIC_API_KEY="your-api-key-here"Option B: Command Line Argument
doc-translate translate -i file.md -l Spanish --key your-api-key-here# Translate README.md to Spanish
doc-translate translate -i README.md -l Spanish
# Translate with custom output file
doc-translate translate -i docs/guide.md -l French -o docs/guide_fr.md
# Translate using API key argument
doc-translate translate -i file.md -l German --key your-api-key
# Translate with AST mode (default)
doc-translate translate -i examples/External_table.md -l JapaneseThe tool supports batch processing of multiple markdown files using glob patterns:
# Translate all .md files in current directory
doc-translate translate -i "*.md" -l Spanish -d ./spanish/
# Translate all markdown files in docs folder and subfolders
doc-translate translate -i "docs/**/*.md" -l French -d ./translations/
# Batch translate with flat structure (no subdirectories)
doc-translate translate -i "content/**/*.md" -l German -d ./output/ --flat
# Batch translate with custom suffix
doc-translate translate -i "*.md" -l ja -d ./translated/ --suffix "ja"doc-translate translate [options]
Options:
-i, --input <pattern> Input file path or glob pattern (required)
Examples: "file.md", "*.md", "docs/**/*.md"
-l, --language <lang> Target language (required)
-o, --output <file> Output file path (for single file translation)
-d, --output-dir <dir> Output directory (for batch translation or single file)
-k, --key <apikey> Anthropic API key (optional)
--flat Use flat structure in output directory (default: preserve structure)
--suffix <suffix> Custom suffix for output files (default: language name)
--log-chunk-metadata Log API metadata for each chunk
--trace Log per-ID source text sent and translated text receiveddoc-translate languagesdoc-translate setupdoc-translate --helpThe tool supports 40+ languages including:
- European: Spanish, French, German, Italian, Portuguese, Dutch, Russian, Polish, Swedish, Norwegian, Danish, Finnish, Greek, Ukrainian, Czech, Hungarian, Romanian, Bulgarian, Croatian, Serbian, Slovak, Slovenian, Estonian, Latvian, Lithuanian, Catalan, Basque, Welsh, Irish
- Asian: Chinese, Japanese, Korean, Hindi, Thai, Vietnamese, Indonesian, Malay
- Middle Eastern: Arabic, Hebrew, Turkish
Tip
Use the two letter short code for the language if you like. For example,
zhinstead of "Simplified Chinese".
doc-translate translate -i README.md -l esOutput: Creates README_spanish.md with Spanish translation
doc-translate translate -i docs/api.md -l fr -o docs/fr/api.mdOutput: Creates docs/fr/api.md with French translation
doc-translate translate -i guide.md -l German --key AIzaSyC...The tool automatically handles large files by splitting them into chunks:
doc-translate translate -i large-document.md -l jadoc-translate translate -i "*.md" -l Spanish -d ./spanish/Output: Translates all .md files in current directory to ./spanish/ folder
doc-translate translate -i "docs/**/*.md" -l French -d ./translations/Output: Translates all markdown files in docs/ and preserves directory structure in ./translations/
docs/
├── guide.md
├── api/
│ └── reference.md
└── tutorials/
└── getting-started.md
# Becomes:
translations/
├── guide_french.md
├── api/
│ └── reference_french.md
└── tutorials/
└── getting-started_french.md
doc-translate translate -i "content/**/*.md" -l German -d ./output/ --flatOutput: Translates all files but places them in a flat structure (no subdirectories)
content/
├── intro.md
├── chapters/
│ ├── chapter1.md
│ └── chapter2.md
└── appendix/
└── notes.md
# Becomes:
output/
├── intro_german.md
├── chapter1_german.md
├── chapter2_german.md
└── notes_german.md
doc-translate translate -i "*.md" -l ja -d ./translated/ --suffix "ja"Output: Uses "ja" instead of "japanese" as the file suffix
✅ Translated:
- Heading text
- Paragraph text
- List items
- Table content
- Link text
- Image alt text
- Quote text
❌ Preserved:
- Code blocks and inline code
- URLs and file paths
- Markdown syntax characters
- HTML tags
- Mathematical expressions
- Technical terms and proper nouns (when appropriate)
The tool provides detailed progress feedback for both single file and batch processing:
╔═══════════════════════════════════════╗
║ Markdown Translator ║
║ Powered by Claude AI ║
╚═══════════════════════════════════════╝
📋 Translation Details:
Input: /path/to/README.md
Output: /path/to/README_spanish.md
Language: Spanish
⠋ Translating chunk 2/3...
✅ Translation completed successfully!
📊 Summary:
Original length: 2,845 characters
Translated length: 3,120 characters
Language: Spanish
Output file: /path/to/README_spanish.md
╔═══════════════════════════════════════╗
║ Markdown Translator ║
║ Powered by Claude AI ║
╚═══════════════════════════════════════╝
📋 Batch Translation Details:
Pattern: docs/**/*.md
Output: /path/to/translations/
Language: Spanish
Structure: Preserved
⠋ [2/5] reference.md - chunk 1/2...
✅ All translations completed successfully!
📊 Summary:
Files processed: 5
Successful: 5
Failed: 0
Output directory: /path/to/translations/
The tool provides clear error messages for common issues:
- Missing or invalid API key
- File not found
- Invalid file format
- Network connectivity issues
- API rate limiting
The examples/ directory contains a test corpus and an automated checker.
A curated set of patterns drawn from real StarRocks documentation that have caused translation problems in the past:
| Pattern | Why it matters |
|---|---|
| YAML frontmatter | Must be preserved exactly |
HTML in Markdown table cells (<ul><li>, <br />, <code class="...">) |
Tags must not be translated or restructured |
Tilde fence code blocks (~~~SQL) |
Must be converted to backtick fences cleanly |
MDX import statements and <Tabs>/<TabItem> JSX |
Must be preserved unchanged |
Template variables in code ({{ data_interval_start }}) |
Airflow/dbt syntax must not be touched |
HTML comparison tables with colspan |
Full HTML blocks must pass through untranslated |
| Admonitions indented inside numbered lists | Indentation must survive translation |
<details> collapsible blocks |
Content indentation must be preserved |
| Cross-references with relative paths and anchors | Only the display text is translated; the URL is not |
After translation, the checker runs 13 static checks against the source/output pair and reports PASS/FAIL for each:
- No
__MTX_placeholder leaks - Heading count
- Code block count and non-comment content
- Link URL preservation
- HTML tags in table cells
- Frontmatter preserved exactly
- Import statements preserved
- Admonition marker count
- Admonition indentation (catches the "indented :::note gets unindented" bug)
- Never-translate term spot-check
- Unordered list item count
- Table column counts
npm test # Translate StarRocksTest.md → zh, then run all checks
npm run test:ja # Translate StarRocksTest.md → ja, then run all checks
npm run check:zh # Re-run checks on an already-translated StarRocksTest_zh.md
npm run check:ja # Re-run checks on an already-translated StarRocksTest_ja.mdcheck:zh and check:ja are useful for iterating on the system prompt or dictionaries without calling the API again.
doc-translator/
├── bin/
│ └── cli.js # CLI entry point
├── src/
│ ├── translator.js # Base class and shared utilities
│ ├── translator_ast_mvp.js # AST-based translator (default)
│ └── configs/
│ ├── system_prompt.txt # Translation instructions for the model
│ ├── never_translate.yaml # Terms that must never be translated
│ └── language_dicts/ # Per-language translation dictionaries
├── examples/
│ ├── StarRocksTest.md # Test corpus
│ └── check_translation.js # Automated output checker
├── package.json
└── README.md
This project uses ES modules (ESM) for modern JavaScript development:
- All files use
import/exportsyntax instead ofrequire/module.exports package.jsonincludes"type": "module"for ESM support- Compatible with the latest versions of dependencies (chalk 5.x, ora 8.x)
- Requires Node.js 16+ for full ESM compatibility
@anthropic-ai/sdk- Anthropic Claude AI SDKcommander- Command-line interface frameworkchalk- Terminal stylingora- Progress spinnersfs-extra- Enhanced file system operations
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ensure your API key is valid and active
- Check that you have sufficient quota in your Anthropic account
- Verify the API key is active in the Anthropic Console
- The tool automatically chunks large files
- Each chunk is processed with a small delay to avoid rate limiting
- Very large files may take several minutes to process
- Use quotes around glob patterns to prevent shell expansion:
"*.md"not*.md - The
--output-diroption is required for batch translation - Large batches may take considerable time; use progress indicators to monitor
- Failed files in a batch are reported individually without stopping the process
- Ensure you have a stable internet connection
- The tool will retry failed requests automatically
- Check firewall settings if you encounter connection issues
If you encounter any issues or have questions:
- Check the troubleshooting section above
- Run
doc-translate setupfor configuration help - Create an issue on the project repository
Happy translating! 🌍✨

