Skip to content

[Bug] CTAS does not preserve VARCHAR(n) length and always creates VARCHAR(1048576) #73374

@vitalyDE

Description

@vitalyDE

Steps to reproduce the behavior

Case 1: CTAS with explicit CAST(... AS VARCHAR(n))

Create a table with CTAS:

CREATE TABLE sandbox.test_varchar_type AS
SELECT CAST('abc' AS VARCHAR(64)) AS c;

Check the resulting column type:

SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'test_varchar_type'
  AND table_schema = 'sandbox';

Case 2: CTAS from a source table that already has VARCHAR(n)

Create a source table:

CREATE TABLE sandbox.src_varchar (
    c VARCHAR(64)
);

Create another table from it via CTAS:

CREATE TABLE sandbox.test_varchar_from_table AS
SELECT c
FROM sandbox.src_varchar;

Check the resulting column type:

SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'test_varchar_from_table'
  AND table_schema = 'sandbox';

Additional check
The following parameters do not change the behavior in any combination:

enable_reduce_cast_varchar_expr_sync_type
enable_reduce_cast_varchar_length_inheritance

Expected behavior

CTAS should preserve the declared VARCHAR(n) length from the output schema of the SELECT.

Expected examples:

  • CAST('abc' AS VARCHAR(64)) should create a column with type VARCHAR(64)
  • selecting a column defined as VARCHAR(64) from an existing table should create a column with type VARCHAR(64)

In general, if the projected column type is VARCHAR(n), then the created table column should also be VARCHAR(n).

Real behavior

In both scenarios, CTAS creates the resulting column as:

VARCHAR(1048576)

Instead of preserving the original varchar length, StarRocks widens the type to the maximum varchar length.

Example values from INFORMATION_SCHEMA.COLUMNS:

  • DATA_TYPE = varchar
  • CHARACTER_MAXIMUM_LENGTH = 1048576
  • COLUMN_TYPE = varchar(1048576)

This is a major issue because it effectively blocks normal usage of dbt with StarRocks in common workflows.

dbt relies heavily on CTAS to create temporary and intermediate tables. If every varchar column is expanded to VARCHAR(1048576), then generated schemas no longer match intended model definitions, making dbt-based development and production usage problematic.

StarRocks version

Reproduced on:
3.5.17
3.5.15

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions