Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the full WordPress PHPUnit test suite #55

Open
adamziel opened this issue Feb 5, 2023 · 9 comments
Open

Support the full WordPress PHPUnit test suite #55

adamziel opened this issue Feb 5, 2023 · 9 comments

Comments

@adamziel
Copy link

adamziel commented Feb 5, 2023

This issue has been solved in the WordPress/sqlite-database-integration plugin, which is available in WordPress Plugin Directory. I'll leave it open here for posterity.


The problem

A lot of WordPress unit tests fail with the SQLite plugin is installed.

The root cause is the method of rewriting MySQL queries to SQLite queries: regular expressions and string lookups. It's hard to support the variety of SQL queries used in WordPress using these tools. I explored patching this plugin in WordPress Playground, but there's too many cases to cover – I included a few examples of failing queries at the bottom of this description.

Proposed solution

Parsing MySQL queries and using contextual information would make this plugin much more bulletproof.

Some queries are hard to reason about with tools like preg_replace or strpos:

-- Valid SQL inside quotes
INSERT INTO wp_options VALUES ('ON DUPLICATE KEY UPDATE') 

-- Four "datetime" occurrences, each with a different meaning:
CREATE TABLE test ( `datetime` datetime NOT NULL, KEY datetime ( datetime ) )

-- Comma inside a string value derails one of the explode(",") calls
UPDATE wp_options SET option_value=','

In contrast, applying the SQL grammar would make rewriting these queries much easier:

// 'ON DUPLICATE KEY' can't be mistaken for SQL keyword:
[
    'INSERT INTO' => [
        'table' => 'wp_options',
        'values' => ['ON DUPLICATE KEY UPDATE']
    ]
]

// It's clear how to treat each part of the query:
[
    'CREATE TABLE' => [
        'name' => 'test',
        'fields' => [
            'datetime' => [
                'type' => 'datetime',
                'notnull' => true
            ]
        ],
        'keys' => [
            'datetime' => ['datetime']
        ]
    ]
]

// There's no risk of mistaking the inserted "," for a field separator:
[
    'UPDATE' => [
        'table' => 'wp_options',
        'set' => [
            'option_value' => ','
        ]
    ]
]

How to parse SQL queries?

Full parsing may not be needed. Even the array representation from the examples above may be too much. Creating many objects, arrays, and function calls for each SQL query would make WordPress extremely slow. There's a faster alternative.

Consider WP_HTML_Tag_Processor, a new WordPress API that enables adding, removing, and updating HTML attributes. It never turns the input HTML into a tree. Instead, it consumes the input HTML string one tag and one attribute at a time, and uses the contextual information to build the output HTML:

$w = new WP_HTML_Tag_Processor( '<a href="#" class="button"></a>' );
$w->next_tag( 'a' );
$w->remove_attribute( 'href' );
echo $w->get_updated_html();
// Outputs: <a class="button"></a>
// No tree was constructed. Instead, the processor used the HTML parsing rules
// to find the start and end of href="#", and then excluded that attribute from
// the output HTML

Perhaps the same technique could be used to rewrite MySQL queries?

For example:

$mysql = "CREATE TABLE test ( `datetime` datetime NOT NULL, KEY datetime ( datetime ) )";
$sqlite = '';

$at = 0;
$max = strlen($sql);
while($at < $max) {
    $last_at = $at;

    // Find the first character that's not " ", "\t", or "("
    $at += strcspn($mysql, "\t (", $at);

    // Find out what token we're looking at now:
    $token = consume_token($mysql, $at);
    if ( $token === 'CREATE TABLE' ) {
        // If it's a CREATE TABLE query, append the correct SQLite
        // command to the output string:
        $sqlite .= 'CREATE TABLE ';

        // Expect the table name to at the first non-whitespace character
        $at += strcspn($mysql, " \t", $at);
        $table_name_starts = $at;

        // Skip backticks
        if($query[$at] === "`") {
            ++$at;
        }

        // Expect the table name to consist of alnum
        $at += strspn($mysql, "abcdefghijklmnopqrstuwxyzABCDEFGHIJKLMNOPQRSTUWXYZ0123456789", $at);

        // Skip backticks
        if($query[$at] === "`") {
            ++$at;
        }

        $table_name_ends = $at;
        $table_name = substr($mysql, $table_name_starts, $table_name_ends - $table_name_starts );
        if ( !$table_name ) {
          return false; // Syntax error
        }

        // Append a quoted table name to the output query
        $sqlite .= '"' . $table_name . '"';

        // Find and skip the '('
        $at += strcspn($mysql, '(', $at) + 1;

        // Consume the field name and definition
        // ...        
    } else if($token === 'SELECT' ) {
        // Consume a SELECT query
    } else if // ...
}

// Consume CREATE

Too low-level? Here's a higher-level idea:

preg_match(
    '^(INSERT|CREATE|UPDATE|SELECT)\s+',
    $query,
    &$matches,
    PREG_OFFSET_CAPTURE,
    $at
);

Use preg_match to capture the next token, but nothing else. The parser would still move forward a single token at a time and make decisions based on what it just captured. The idea is to never write a regexp for the entire query. It's more readable than the lower-level counterparts, still reasonably fast, and only treats one token at a time. Also – if it works, it could be replaced by a lower-level implementation relatively easily.

While I am proposing a major change, I specifically don't mean a full rewrite. This plugin got a lot of things right:

  • Data types are mapped
  • Different types of queries are recognized and treated separately
  • When 1<=>1 translation isn't possible, this plugin translates a MySQL query to a series of SQLite queries

The existing work is an excellent foundation and I only propose to take it to the next level.

Finally, thank you for maintaining this plugin @aaemnnosttv! It made WordPress Playground possible.

Failing queries

Here's a few examples of queries that didn't work for me at one point or another:

DELETE a, b FROM wp_options a, wp_options b
	WHERE a.option_name LIKE '_transient_%'
	AND a.option_name NOT LIKE '_transient_timeout_%'
	AND b.option_name = CONCAT( '_transient_timeout_', SUBSTRING( a.option_name, 12 ) )
	AND b.option_value < 1675613957

CREATE TABLE wp_wc_download_log (
		download_log_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
		timestamp datetime NOT NULL,
		permission_id BIGINT UNSIGNED NOT NULL,
		user_id BIGINT UNSIGNED NULL,
		user_ip_address VARCHAR(100) NULL DEFAULT '',
		PRIMARY KEY  (download_log_id),
		KEY permission_id (permission_id),
		KEY timestamp (timestamp)

CREATE TABLE test ( timestamp datetime NOT NULL, KEY timestamp ( timestamp ) )

ALTER TABLE wp_wc_product_download_directories CHANGE COLUMN `url` url varchar(256) NOT NULL

ALTER TABLE wp_woocommerce_downloadable_product_permissions ADD PRIMARY KEY  (`permission_id`)

cc @aristath @felixarntz @dmsnell @gziolo

@adamziel adamziel changed the title Support the full WordPress PHPUnit test suite SQL Parser to support the full WordPress PHPUnit test suite Feb 5, 2023
@adamziel adamziel changed the title SQL Parser to support the full WordPress PHPUnit test suite Support the full WordPress PHPUnit test suite Feb 5, 2023
@adamziel
Copy link
Author

adamziel commented Feb 6, 2023

Looking for prior art on this I found an open-source Java-based SQL parser and translator: https://github.com/jOOQ/jOOQ

I recorded 25k SQL queries during WordPress and WooCommerce setup, and pasted them on their demo page.

Here's what's not supported:

  • Escaping with \' ('' must be used instead)
  • REPLACE INTO
  • SELECT FOUND_ROWS()
  • INSERT IGNORE
  • DELETE FROM a, b
  • SELECT SQL_CALC_FOUND_ROWS
  • Key named timestamp (as in KEYtimestamp (timestamp))
  • MySQL REGEXP operator (WHERE field REGEXP 'expression')
  • Keys with specified field name (KEY args (args(191)), -- args(191) errors out)
  • DESCRIBE, SHOW INDEX, SHOW TABLE queries

Everything else worked, including many CREATE TABLE statements, complex joins, and ON DUPLICATE KEY clauses.

jOOQ could perhaps be ported to PHP, or at least be a good source of inspiration.

@adamziel
Copy link
Author

adamziel commented Feb 6, 2023

See also: List of Converter Tools on sqlite.org

SQLFairy seems interesting:

SQL::Translator is a group of Perl modules that manipulate structured data definitions (mostly database schemas) in interesting ways, such as converting among different dialects of CREATE syntax (e.g., MySQL-to-Oracle)

Edit: Actually, it's pretty limited:

Presently only the definition parts of SQL are handled (CREATE, ALTER), not the manipulation of data (INSERT, UPDATE, DELETE).

@adamziel
Copy link
Author

adamziel commented Feb 6, 2023

Another idea: Simplify the sql_yacc.yy grammar definition shipped with MySQL to remove unsupported features (like polygons etc), autogenerate a tokenizer, then add SQLite translation layer.

@adamziel
Copy link
Author

adamziel commented Feb 7, 2023

I recorded all SQL queries ran during the WordPress PHPUnit test suite run:
queries.sql.zip

@aristath found the open-source phpmyadmin/sql-parser project that tokenizes and parses MySQL queries.

@adamziel
Copy link
Author

adamziel commented Feb 7, 2023

I built a proof of concept of translating CREATE TABLE queries to SQLite:

https://gist.github.com/adamziel/d4d2f335a7671054d9ccee8d723ec0e8

It reproduces similar steps as this plugin. It translated all WordPress MySQL tables I gave it into SQLite-compatible queries – see the log.sql file to see the inputs and outputs.

@adamziel
Copy link
Author

adamziel commented Feb 8, 2023

I expanded the above translator to UPDATE, SELECT, DELETE, and INSERT queries:

https://gist.github.com/adamziel/7f71ddba7d240d5b6f342fa2d9c55ea3

Even though this version is quick, dirty, and makes a bunch of assumptions that won't always hold, it translates all the queries from the WordPress unit test suite into syntactically valid SQLite queries. It deals with date functions, intervals, INSERT IGNORE and a few others.

It's still missing support for a REGEXP b, FOUND_ROWS(), and ORDER BY FIELD. Also, the semantics of the translated queries currently differ from the original queries in a few places. The fixes are very much possible. I didn't get there yet.

Support for ALTER TABLE, DESCRIBE, SHOW FULL COLUMNS, SHOW TABLES, etc is still missing.

@adamziel
Copy link
Author

adamziel commented Feb 8, 2023

The explorations continue in WordPress/sqlite-database-integration

@adamziel
Copy link
Author

The lexer-based implementation in the WordPress/sqlite-database-integration plugin passes most WordPress unit tests!

Tests: 14853, Assertions: 47545, Errors: 15, Failures: 37, Warnings: 34, Skipped: 60, Risky: 2.

The remaining failures will have to be fixed in WordPress core.

I recommend migrating to WordPress/sqlite-database-integration once the lexer-based implementation is merged.

@adamziel
Copy link
Author

The new version of WordPress/sqlite-database-integration is now released: https://wordpress.org/plugins/sqlite-database-integration/

adamziel added a commit to WordPress/wordpress-playground that referenced this issue Mar 17, 2023
Migrates Playground from regexp-based [wp-sqlite-db](aaemnnosttv/wp-sqlite-db#55) to the official [sqlite-database-integration](https://github.com/WordPress/sqlite-database-integration) plugin which [translates queries using SQL parser](WordPress/sqlite-database-integration#9) instead of regular expressions. 

This makes for a more reliable SQLite integration that passes almost all WordPress unit tests.

[Learn more](aaemnnosttv/wp-sqlite-db#55) about the problem with regular expressions.
Pookie717 added a commit to Pookie717/wordpress-playground that referenced this issue Oct 1, 2023
Migrates Playground from regexp-based [wp-sqlite-db](aaemnnosttv/wp-sqlite-db#55) to the official [sqlite-database-integration](https://github.com/WordPress/sqlite-database-integration) plugin which [translates queries using SQL parser](WordPress/sqlite-database-integration#9) instead of regular expressions. 

This makes for a more reliable SQLite integration that passes almost all WordPress unit tests.

[Learn more](aaemnnosttv/wp-sqlite-db#55) about the problem with regular expressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant