Skip to content

MDEV-38740 Implement JSON as a pluggable data type#5263

Draft
hadeer-r wants to merge 2 commits into
MariaDB:mainfrom
hadeer-r:MDEV-38740
Draft

MDEV-38740 Implement JSON as a pluggable data type#5263
hadeer-r wants to merge 2 commits into
MariaDB:mainfrom
hadeer-r:MDEV-38740

Conversation

@hadeer-r

Copy link
Copy Markdown
Contributor
  • This PR implements JSON as a distinct pluggable data type in MariaDB using the MariaDB_DATA_TYPE_PLUGIN framework.
  • Related issue: MDEV-38740
  • This work is part of GSoC '26 and represents the initial progress and implementation milestones.

Implementation Progress

  • Implemented Type_handler_json, Field_json, and Type_collection_json, following XMLType structure while taking into account existing JSON type behavior.
  • SHOW CREATE TABLE and DESCRIBE now explicitly display the column type as json instead of longtext.
  • JSON columns natively accept CHARACTER SET and COLLATE attributes (e.g., JSON CHARACTER SET utf8) without throwing parse errors.
  • send format=json metadata to clients over the wire protocol.

Testing
Migrated the built-in JSON tests (mysql-test/main/type_json.test) into the plugin test suite, splitting them into focused, modular files:

  • type_json_basic.test (create, select, drop, CTAS)
  • type_json_validation.test
  • type_json_cast.test
  • type_json_protocol.test

TODO
Implement implicit JSON validation through Field_json::store(). (Currently, inserting invalid JSON like 'a' succeeds unless an explicit CHECK constraint is added).

image

Starting by intailizeing and loading json type plugin
Put basic definition of Field_json

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a pluggable JSON data type plugin (type_json) for MariaDB, migrating the built-in JSON type handling to a modular plugin structure. The changes include adding the plugin implementation (plugin.cc, sql_type_json.cc, sql_type_json.h), updating parser rules in sql_yacc.yy, and adding comprehensive test suites. Feedback on the implementation highlights several critical issues: a potential null pointer dereference in Field_json::report_wrong_value when table is null, an incorrect charset mapping in make_conversion_table_field which should strictly use utf8mb4_bin, and a regression in aggregate_for_comparison that prevents comparisons between JSON and numeric or temporal types. Additionally, minor cleanups are recommended, such as removing commented-out method declarations and fixing a comment typo in plugin.cc.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +103 to +110
int Field_json::report_wrong_value(const ErrConv &val)
{
get_thd()->push_warning_truncated_value_for_field(
Sql_condition::WARN_LEVEL_WARN, "json", val.ptr(), table->s->db.str,
table->s->table_name.str, field_name.str);
reset();
return 1;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Field_json::report_wrong_value, the code accesses table->s->db.str and table->s->table_name.str without checking if table is NULL. In MariaDB, Field objects can sometimes have a NULL table pointer (e.g., when used as temporary or conversion fields). Accessing table->s directly in these cases will cause a server crash (segmentation fault).

int Field_json::report_wrong_value(const ErrConv &val)
{
  get_thd()->push_warning_truncated_value_for_field(
      Sql_condition::WARN_LEVEL_WARN, "json", val.ptr(),
      table ? table->s->db.str : "",
      table ? table->s->table_name.str : "",
      field_name.str);
  reset();
  return 1;
}

Comment on lines +46 to +56
Field *Type_handler_json::make_conversion_table_field(
MEM_ROOT *root, TABLE *table, uint metadata, const Field *target) const
{
/* A JSON field uses 4 bytes for length as json file is a long blob*/
uint pack_length= metadata & 0x00ff;
if (pack_length != 4)
return NULL;

return new (root) Field_json(NULL, (uchar *) "", 1, Field::NONE,
&empty_clex_str, table->s, target->charset());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Type_handler_json::make_conversion_table_field, the charset of the conversion field is set to target->charset(). However, a Field_json must always use the utf8mb4_bin character set to ensure correct JSON representation and comparison behavior. If the target field has a different character set (e.g., latin1), creating a Field_json with that character set violates this invariant and can lead to encoding issues or assertion failures.

Field *Type_handler_json::make_conversion_table_field(
    MEM_ROOT *root, TABLE *table, uint metadata, const Field *target) const
{
  /* A JSON field uses 4 bytes for length as json file is a long blob*/
  uint pack_length= metadata & 0x00ff;
  if (pack_length != 4)
    return NULL;

  return new (root) Field_json(NULL, (uchar *) "", 1, Field::NONE,
                               &empty_clex_str, table->s, &my_charset_utf8mb4_bin);
}

Comment on lines +36 to +46
// const Type_handler *type_handler_for_comparison() const override;
// const Type_handler *type_handler_for_tmp_table(const Item *item) const
// override;
// bool Item_hybrid_func_fix_attributes(THD *thd, const LEX_CSTRING &func_name,
// Type_handler_hybrid_field_type *handler, Type_all_attributes *func,
// Item **items, uint nitems) const override;

// virtual Item *create_typecast_item(THD *thd, Item *item,
// const Type_cast_attributes &attr) const override;

// Item *make_constructor_item(THD *thd, List<Item> *args) const override;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are several commented-out method declarations in the Type_handler_json class. To maintain code cleanliness and readability, these unused, commented-out lines should be removed.

References
  1. Keep the API of custom wrapper classes minimal by avoiding or removing unused methods.

Comment on lines +60 to +73
const Type_handler *
Type_collection_json::aggregate_for_comparison(const Type_handler *a,
const Type_handler *b) const
{
if (a->type_collection() == this)
swap_variables(const Type_handler *, a, b);
if (a == &type_handler_json || a == &type_handler_hex_hybrid ||
a == &type_handler_tiny_blob || a == &type_handler_blob ||
a == &type_handler_medium_blob || a == &type_handler_long_blob ||
a == &type_handler_varchar || a == &type_handler_string ||
a == &type_handler_null)
return b;
return NULL;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In Type_collection_json::aggregate_for_comparison, the aggregation only allows comparison between JSON and other string/blob/null types. Any comparison between JSON and numeric types (like INT, DOUBLE, DECIMAL) or temporal types (like DATETIME, DATE) will return NULL. This prevents implicit comparison between JSON columns and numbers/dates, which is a regression compared to the previous LONGTEXT-based JSON implementation. Consider allowing aggregation with other basic types or delegating to the standard string type collection.

NULL, // Status variables
NULL, // System variables
"1.0", // String version representation
MariaDB_PLUGIN_MATURITY_GAMMA // Maturity(see include/mysql/plugin.h)*/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo/formatting issue at the end of the comment on line 28: */ is left over at the end of a single-line // comment. This should be cleaned up.

  MariaDB_PLUGIN_MATURITY_GAMMA // Maturity (see include/mysql/plugin.h)

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jun 23, 2026
- Implemented the JSON data type via 'MariaDB_DATA_TYPE_PLUGIN1 API
- `SHOW CREATE TABLE` and `DESCRIBE` now explicitly display the column
type as `json` instead of `longtext`.
- Move json tests to the plugin and split it into smaller focused
files inside the plugin test suite.

- follow structure xmltype plugin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants