Skip to content

HIVE-29514: Optimize UDF Unhex and improve its test coverage#6471

Open
tanishq-chugh wants to merge 3 commits into
apache:masterfrom
tanishq-chugh:HIVE-29514
Open

HIVE-29514: Optimize UDF Unhex and improve its test coverage#6471
tanishq-chugh wants to merge 3 commits into
apache:masterfrom
tanishq-chugh:HIVE-29514

Conversation

@tanishq-chugh
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Improve the UDF Unhex and its test coverage

Why are the changes needed?

Better performance

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual testing + java test

@sonarqubecloud
Copy link
Copy Markdown

Comment on lines +68 to +70
if (val == -1) {
return null;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for invalid character in an odd-length input.
Example: G.

Comment on lines +74 to 83
while (i < len) {
int high = decodeHexChar(textBytes[i++]);
int low = decodeHexChar(textBytes[i++]);

if (high == -1 || low == -1) {
return null;
}

result[resIdx++] = (byte) ((high << 4) | low);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

while (i < len) {
  int high, low;
  if ((high = decodeHexChar(textBytes[i++])) == -1 ||
      (low  = decodeHexChar(textBytes[i++])) == -1) {
    return null;
  }
  result[resIdx++] = (byte) ((high << 4) | low);
}

so that we don't compute low if high == -1. Consider adding tests for when high == -1 and low == -1.

Text hexEmpty = new Text("");
byte[] expectedEmpty = new byte[0];
assertArrayEquals(expectedEmpty, udf.evaluate(hexEmpty));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have tests for

  • mixed case: aABb9
  • boundary values
  • lower case, as all tests are for upper case right now

Also, maybe not in this file, but are there any tests for hex(unhex(...)) and unhex(hex(..))?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants