Skip to content

[RISCV] fix: endianness inconsistency between XXX_disassembled.elf and XXX.json#3020

Open
OlivierBBB wants to merge 3 commits into
mainfrom
3019-riscv-endianness-inconsistency
Open

[RISCV] fix: endianness inconsistency between XXX_disassembled.elf and XXX.json#3020
OlivierBBB wants to merge 3 commits into
mainfrom
3019-riscv-endianness-inconsistency

Conversation

@OlivierBBB
Copy link
Copy Markdown
Contributor

@OlivierBBB OlivierBBB commented May 5, 2026

We process the output of objdump with sed to reverse the 4 byte instructions to comply with endianness of the risc5 binary. This aligns it with the endianness found in the json.

Ex:

   11158: 08800137         lui    sp, 0x8800

becomes

   11158: 08800137 (37018008)         lui    sp, 0x8800

The first 08800137 is the original mnemonic produced by objdump. It can be fed into an online tool that decomposes risc5 instructions and be successfully decoded. (37018008) on the other hand is obtained by reversing the bytes of the previous string. One can find this 4 byte string in the json file.


Note

Low Risk
Low risk: changes are confined to the test/examples build tooling and documentation, affecting only how disassembly output is generated.

Overview
Fixes an endianness mismatch between <name>_disassembled.elf and the generated <name>.json by post-processing objdump output to also display instruction bytes in memory order (including compressed 16-bit instructions).

The examples Makefile now auto-detects a suitable disassembler via a new OBJDUMP variable (preferring llvm-objdump/rust-objdump, with macOS Homebrew and GNU fallbacks) and documents the new requirement/option in README.md.

Reviewed by Cursor Bugbot for commit 03c9e1a. Bugbot is set up for automated code reviews on this repo. Configure here.

we process the output of objdump with sed to reverse the 4 byte
instrutions to comply with endianness of the risc5 binary. This aligns
it with the endianness found in the json.

Ex:

   11158: 08800137 (37018008)         lui    sp, 0x8800

The first 08800137 is the original mnemonic produced by objdump. It can
be fed into an online tool that decomposes risc5 instructions and be
successfully decoded. (37018008) on the other hand is obtained by
reversing the bytes of the previous string. One can find this 4 byte
string in the json file.
@OlivierBBB OlivierBBB linked an issue May 5, 2026 that may be closed by this pull request
@OlivierBBB OlivierBBB self-assigned this May 5, 2026
Copy link
Copy Markdown
Contributor

@DavePearce DavePearce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The json looks correct btw. If you use xxd -e -g1 you can see that the order of bytes is 37 first then 01, etc.

@OlivierBBB OlivierBBB removed the request for review from letypequividelespoubelles May 5, 2026 23:54
| `ZIG_STRIP` | `true` | Strip when compiling Zig (reduces binary size), ignored for `.s` and `.rs` |
| `IN_BYTES` | `""` | Input bytes written to memory at `IN_BYTES_OFFSET` before execution |
| `PROGRAM_OFFSET` | `0` | Memory offset where the program is loaded (up to 128 MB) |
| `IN_BYTES_OFFSET` | `0x8000000` | Memory offset where input bytes are written (up to 1 GB) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make clear we did not forget a 0, we may write 0x08000000.

# Disassembler: prefer llvm-objdump (shows bytes in memory order, matching the JSON format).
# Falls back to the Homebrew keg-only path on macOS, then to riscv64-unknown-elf-objdump.
# Override with: make OBJDUMP=/path/to/objdump
OBJDUMP ?= $(or $(shell command -v llvm-objdump 2>/dev/null),$(shell command -v rust-objdump 2>/dev/null),$(shell test -x /opt/homebrew/opt/llvm/bin/llvm-objdump && echo /opt/homebrew/opt/llvm/bin/llvm-objdump),$(shell command -v riscv64-unknown-elf-objdump 2>/dev/null))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OBJDUMP := can be used if this never changes.

# <hex-pc>: <byte_1><byte_2><byte_3><byte_4> (<byte_4><byte_3><byte_2><byte_1>)<non hex><etc>
#
# Claude also included the same type of operation for COMPRESSED instructions ... the 2nd -e expression
@$(OBJDUMP) -d --line-numbers -S $(BIN) \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so here the 3 commands above are tried until one of them is succeed? Also, I would add a comment saying that the generated file is modified with sed command for debugging purposes or so, so it is very clear it is not the standard one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I do not understand the comment about Claude. We do not expect COMPRESSED instructions to appear.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the description of this PR may be a commented in the Makefile, as it helps understanding what happens.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-d --line-numbers -S those flags I believe were specific for riscv64-unknown-elf-objdump, now it seems they are applied in all cases. Does it work with assembly, rust and zig?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could riscv64-unknown-elf-objdump be used in all cases, with some adjustement using sed to make them consistent otherwise?

| sed -E \
-e 's/^([[:space:]]*[0-9a-f]+:[[:space:]]*)([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})([^0-9a-f])/\1\2\3\4\5 (\5\4\3\2)\6/' \
-e 's/^([[:space:]]*[0-9a-f]+:[[:space:]]*)([0-9a-f]{2})([0-9a-f]{2})([^0-9a-f])/\1\2\3 (\3\2)\4/' \
> $(BIN)_disassembled.elf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Byte reversal breaks with llvm-objdump output format

High Severity

The sed byte-reversal is designed for GNU objdump's big-endian display format (e.g., 08800137(37018008)), but OBJDUMP now prefers llvm-objdump, which the comment at line 30 explicitly says "shows bytes in memory order." If llvm-objdump outputs 37018008, the sed reversal produces (08800137) — the opposite of what's in the JSON, defeating the stated purpose of aligning endianness with the JSON file. The transformation is tool-dependent but applied unconditionally regardless of which objdump was selected.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5c047a0. Configure here.

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 03c9e1a. Configure here.

#
# <hex-pc>: <byte_1><byte_2><byte_3><byte_4> (<byte_4><byte_3><byte_2><byte_1>)<non hex><etc>
#
# Claude also included the same type of operation for COMPRESSED instructions ... the 2nd -e expression
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally committed AI tool reference in comment

Low Severity

The comment # Claude also included the same type of operation for COMPRESSED instructions ... the 2nd -e expression is an informal development note referencing the AI tool used to author the code. It's not meaningful documentation for maintainers and has already caused reviewer confusion (per the PR discussion). It should be replaced with a proper explanation of why the second -e expression exists.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 03c9e1a. Configure here.

# Disassembler: prefer llvm-objdump (shows bytes in memory order, matching the JSON format).
# Falls back to the Homebrew keg-only path on macOS, then to riscv64-unknown-elf-objdump.
# Override with: make OBJDUMP=/path/to/objdump
OBJDUMP ?= $(or $(shell command -v llvm-objdump 2>/dev/null),$(shell command -v rust-objdump 2>/dev/null),$(shell test -x /opt/homebrew/opt/llvm/bin/llvm-objdump && echo /opt/homebrew/opt/llvm/bin/llvm-objdump),$(shell command -v riscv64-unknown-elf-objdump 2>/dev/null))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty OBJDUMP causes cryptic build failure

Low Severity

If none of the auto-detected objdump tools (llvm-objdump, rust-objdump, Homebrew path, riscv64-unknown-elf-objdump) are installed, $(or ...) returns an empty string and OBJDUMP is set to empty. The subsequent @$(OBJDUMP) -d --line-numbers -S $(BIN) then expands to @ -d --line-numbers -S <bin>, causing the shell to try to execute -d as a command, producing a confusing error with no indication that the real problem is a missing disassembler.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 03c9e1a. Configure here.

Copy link
Copy Markdown
Contributor

@lorenzogentile404 lorenzogentile404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this adds complexity to the script for a little advantage in terms of debugging. Unless really needed, I would not merge this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RISCV] endianness inconsistency

4 participants