[RISCV] fix: endianness inconsistency between XXX_disassembled.elf and XXX.json#3020
[RISCV] fix: endianness inconsistency between XXX_disassembled.elf and XXX.json#3020OlivierBBB wants to merge 3 commits into
XXX_disassembled.elf and XXX.json#3020Conversation
we process the output of objdump with sed to reverse the 4 byte instrutions to comply with endianness of the risc5 binary. This aligns it with the endianness found in the json. Ex: 11158: 08800137 (37018008) lui sp, 0x8800 The first 08800137 is the original mnemonic produced by objdump. It can be fed into an online tool that decomposes risc5 instructions and be successfully decoded. (37018008) on the other hand is obtained by reversing the bytes of the previous string. One can find this 4 byte string in the json file.
DavePearce
left a comment
There was a problem hiding this comment.
LGTM. The json looks correct btw. If you use xxd -e -g1 you can see that the order of bytes is 37 first then 01, etc.
| | `ZIG_STRIP` | `true` | Strip when compiling Zig (reduces binary size), ignored for `.s` and `.rs` | | ||
| | `IN_BYTES` | `""` | Input bytes written to memory at `IN_BYTES_OFFSET` before execution | | ||
| | `PROGRAM_OFFSET` | `0` | Memory offset where the program is loaded (up to 128 MB) | | ||
| | `IN_BYTES_OFFSET` | `0x8000000` | Memory offset where input bytes are written (up to 1 GB) | |
There was a problem hiding this comment.
To make clear we did not forget a 0, we may write 0x08000000.
| # Disassembler: prefer llvm-objdump (shows bytes in memory order, matching the JSON format). | ||
| # Falls back to the Homebrew keg-only path on macOS, then to riscv64-unknown-elf-objdump. | ||
| # Override with: make OBJDUMP=/path/to/objdump | ||
| OBJDUMP ?= $(or $(shell command -v llvm-objdump 2>/dev/null),$(shell command -v rust-objdump 2>/dev/null),$(shell test -x /opt/homebrew/opt/llvm/bin/llvm-objdump && echo /opt/homebrew/opt/llvm/bin/llvm-objdump),$(shell command -v riscv64-unknown-elf-objdump 2>/dev/null)) |
There was a problem hiding this comment.
OBJDUMP := can be used if this never changes.
| # <hex-pc>: <byte_1><byte_2><byte_3><byte_4> (<byte_4><byte_3><byte_2><byte_1>)<non hex><etc> | ||
| # | ||
| # Claude also included the same type of operation for COMPRESSED instructions ... the 2nd -e expression | ||
| @$(OBJDUMP) -d --line-numbers -S $(BIN) \ |
There was a problem hiding this comment.
so here the 3 commands above are tried until one of them is succeed? Also, I would add a comment saying that the generated file is modified with sed command for debugging purposes or so, so it is very clear it is not the standard one.
There was a problem hiding this comment.
Also, I do not understand the comment about Claude. We do not expect COMPRESSED instructions to appear.
There was a problem hiding this comment.
Actually the description of this PR may be a commented in the Makefile, as it helps understanding what happens.
There was a problem hiding this comment.
-d --line-numbers -S those flags I believe were specific for riscv64-unknown-elf-objdump, now it seems they are applied in all cases. Does it work with assembly, rust and zig?
There was a problem hiding this comment.
Could riscv64-unknown-elf-objdump be used in all cases, with some adjustement using sed to make them consistent otherwise?
| | sed -E \ | ||
| -e 's/^([[:space:]]*[0-9a-f]+:[[:space:]]*)([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})([0-9a-f]{2})([^0-9a-f])/\1\2\3\4\5 (\5\4\3\2)\6/' \ | ||
| -e 's/^([[:space:]]*[0-9a-f]+:[[:space:]]*)([0-9a-f]{2})([0-9a-f]{2})([^0-9a-f])/\1\2\3 (\3\2)\4/' \ | ||
| > $(BIN)_disassembled.elf |
There was a problem hiding this comment.
Byte reversal breaks with llvm-objdump output format
High Severity
The sed byte-reversal is designed for GNU objdump's big-endian display format (e.g., 08800137 → (37018008)), but OBJDUMP now prefers llvm-objdump, which the comment at line 30 explicitly says "shows bytes in memory order." If llvm-objdump outputs 37018008, the sed reversal produces (08800137) — the opposite of what's in the JSON, defeating the stated purpose of aligning endianness with the JSON file. The transformation is tool-dependent but applied unconditionally regardless of which objdump was selected.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5c047a0. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 03c9e1a. Configure here.
| # | ||
| # <hex-pc>: <byte_1><byte_2><byte_3><byte_4> (<byte_4><byte_3><byte_2><byte_1>)<non hex><etc> | ||
| # | ||
| # Claude also included the same type of operation for COMPRESSED instructions ... the 2nd -e expression |
There was a problem hiding this comment.
Accidentally committed AI tool reference in comment
Low Severity
The comment # Claude also included the same type of operation for COMPRESSED instructions ... the 2nd -e expression is an informal development note referencing the AI tool used to author the code. It's not meaningful documentation for maintainers and has already caused reviewer confusion (per the PR discussion). It should be replaced with a proper explanation of why the second -e expression exists.
Reviewed by Cursor Bugbot for commit 03c9e1a. Configure here.
| # Disassembler: prefer llvm-objdump (shows bytes in memory order, matching the JSON format). | ||
| # Falls back to the Homebrew keg-only path on macOS, then to riscv64-unknown-elf-objdump. | ||
| # Override with: make OBJDUMP=/path/to/objdump | ||
| OBJDUMP ?= $(or $(shell command -v llvm-objdump 2>/dev/null),$(shell command -v rust-objdump 2>/dev/null),$(shell test -x /opt/homebrew/opt/llvm/bin/llvm-objdump && echo /opt/homebrew/opt/llvm/bin/llvm-objdump),$(shell command -v riscv64-unknown-elf-objdump 2>/dev/null)) |
There was a problem hiding this comment.
Empty OBJDUMP causes cryptic build failure
Low Severity
If none of the auto-detected objdump tools (llvm-objdump, rust-objdump, Homebrew path, riscv64-unknown-elf-objdump) are installed, $(or ...) returns an empty string and OBJDUMP is set to empty. The subsequent @$(OBJDUMP) -d --line-numbers -S $(BIN) then expands to @ -d --line-numbers -S <bin>, causing the shell to try to execute -d as a command, producing a confusing error with no indication that the real problem is a missing disassembler.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 03c9e1a. Configure here.
lorenzogentile404
left a comment
There was a problem hiding this comment.
I believe this adds complexity to the script for a little advantage in terms of debugging. Unless really needed, I would not merge this one.


We process the output of
objdumpwithsedto reverse the 4 byte instructions to comply with endianness of the risc5 binary. This aligns it with the endianness found in the json.Ex:
becomes
The first
08800137is the original mnemonic produced by objdump. It can be fed into an online tool that decomposes risc5 instructions and be successfully decoded. (37018008) on the other hand is obtained by reversing the bytes of the previous string. One can find this 4 byte string in the json file.Note
Low Risk
Low risk: changes are confined to the test/examples build tooling and documentation, affecting only how disassembly output is generated.
Overview
Fixes an endianness mismatch between
<name>_disassembled.elfand the generated<name>.jsonby post-processingobjdumpoutput to also display instruction bytes in memory order (including compressed 16-bit instructions).The examples
Makefilenow auto-detects a suitable disassembler via a newOBJDUMPvariable (preferringllvm-objdump/rust-objdump, with macOS Homebrew and GNU fallbacks) and documents the new requirement/option inREADME.md.Reviewed by Cursor Bugbot for commit 03c9e1a. Bugbot is set up for automated code reviews on this repo. Configure here.