Skip to content

[Bug] Inconsistent SHOW PROC '/statistic' on BE leader node shows stale abnormal replicas causing invalid scan info on single-replica tables #73342

@JS-WangZhu

Description

@JS-WangZhu

Description:

In version 4.0.3-fbcf73d, we encountered an inconsistency issue where the SHOW PROC '/statistic' output on the BE leader node differs from other BE nodes.

Problem Details

  • The BE leader node reports two additional abnormal replica entries that do not appear on other nodes.

  • Both affected tables are single-replica tables.

  • The abnormal tablets show:

    • state = NORMAL
    • One replica includes a LastFailedTime
  • These abnormal entries are not automatically cleaned up.

  • Manually marking the replicas as bad is ineffective.

Impact

When querying the affected table through this FE leader node, the query fails with:

SQL Error [1064] [42000]: Build Exec OlapScanNode fail, scan info is invalid

Observed Behavior

  • The invalid replica metadata persists only on the leader node.

  • Other nodes do not report these abnormal replicas.

  • After restarting the leader node:

    • The stale replicas are identified as redundant replicas
    • They are automatically cleaned up

Expected Behavior

  • Replica metadata should remain consistent across nodes
  • Stale/invalid abnormal replicas should be automatically cleaned without requiring leader restart
  • SHOW PROC '/statistic' should not retain ghost replica entries
  • Manual bad marking should properly remove invalid replicas

Reproduction Summary

  1. Single-replica table enters abnormal state
  2. SHOW PROC '/statistic' on leader node shows extra abnormal replica entries
  3. Replica state remains NORMAL but contains failure timestamp
  4. Querying through leader FE fails with invalid OlapScanNode
  5. Restart leader node
  6. Redundant replicas are cleaned

Questions / Investigation Request

Please help investigate:

  • Why does the leader node retain stale replica metadata while other nodes do not?
  • Why are these ghost replicas not cleaned automatically?
  • Why does ADMIN SET REPLICA STATUS PROPERTIES("status"="bad") fail to remove them?
  • Is there a metadata cache or replay inconsistency issue related to leader state?
  • Could this be caused by delayed tablet report reconciliation or replica version state handling?

Version

StarRocks 4.0.3-fbcf73d

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions