Skip to content

Fixes xsd schema to work with libxml 2.12 and higher. #38553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

hostep
Copy link
Contributor

@hostep hostep commented Mar 27, 2024

Description (*)

The latest libxml2 library (2.12.*) contains more strict requirements for xsd schema files:

Several bugs in the regex determinism checks were fixed. Invalid XML Schemas which previous versions erroneously accepted will now be rejected.
Source: https://www.linuxcompatible.org/story/libxml2-2120-released/

There is one xsd file in Magento which now fails these more stricter checks: app/code/Magento/Elasticsearch/etc/esconfig.xsd

When using any Magento version with libxml2 version 2.12.* in developer mode, whenever an xml file with esconfig.xsd schema is being loaded, it will now fail with this error:

Exception #0 (Magento\Framework\Config\Dom\ValidationSchemaException): Processed schema file: vendor/magento/module-elasticsearch/etc/esconfig.xsd
complex type 'mixedDataType': The content model is not determinist.
Line: 18

This PR fixes this. It should be mostly backwards compatible. With the exception that it now always expect the type element to be in front of the default element in the stemmer section, but that matches how Magento's xml files are already setup:

This PR fixes an additional problem - by splitting out the types for stemmer and stopwords_file. Previously, there was no requirement on type element, even though the code that parses this xml file requires the type to exist. That's been fixed by no longer using mixedDataType for both stemmer and stopwords_file elements, but splitting it out in 2 distinct types where the one for stemmer now requires a type element to exist.

I've tested these changes with libxml2 versions 2.9.14 and 2.12.6 and both work correctly.

Related Pull Requests

Fixed Issues (if relevant)

  1. Fixes Catalog search index process error indexation process #38254

Manual testing scenarios (*)

  1. Have php being compiled with libxml2 using a version lower than 2.12.0
  2. Have a test-xsd.php file in the root of the magento project with this contents:
<?php

echo sprintf("Libxml version: %s\n\n", getLibXmlVersion());

$xmlFiles = [
    sprintf('%s/app/code/Magento/Elasticsearch/etc/esconfig.xml', __DIR__),
    sprintf('%s/app/code/Magento/Elasticsearch/Test/Unit/Model/Adapter/Index/Config/_files/esconfig_test.xml', __DIR__),
];
$schemaFile = sprintf('%s/app/code/Magento/Elasticsearch/etc/esconfig.xsd', __DIR__);

foreach ($xmlFiles as $xmlFile) {
    $xml = file_get_contents($xmlFile);

    $dom = new \DOMDocument();
    $dom->loadXML($xml);

    $result = $dom->schemaValidate($schemaFile);

    var_dump($result);
}

function getLibXmlVersion(): string
{
    $reflector = new \ReflectionExtension('libxml');
    ob_start();
    $reflector->info();
    $output = strip_tags(ob_get_clean());
    preg_match('/^libXML Loaded Version (?:=>)?(.*)$/m', $output, $matches);

    return $matches[1];
}
  1. Execute that file via cli: php test-xsd.php, expected is to see no errors
  2. Have php being compiled with libxml2 using a version higher than or equal to 2.12.0
  3. Execute that file via cli: php test-xsd.php, expected is to see no errors

Alternative steps:

  1. Have php being compiled with libxml2 using a version higher than or equal to 2.12.0
  2. Have Magento run in developer mode
  3. Run bin/magento indexer:reindex catalogsearch_fulltext, expect to get no errors

Questions or comments

I haven't found time or energy to add/update unit tests, if this is required, it would be appreciated if somebody could lend a hand.

Contribution checklist (*)

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • README.md files for modified modules are updated and included in the pull request if any README.md predefined sections require an update
  • All automated tests passed successfully (all builds are green)

Copy link

m2-assistant bot commented Mar 27, 2024

Hi @hostep. Thank you for your contribution!
Here are some useful tips on how you can test your changes using Magento test environment.

Add the comment under your pull request to deploy test or vanilla Magento instance:
  • @magento give me test instance - deploy test instance based on PR changes
  • @magento give me 2.4-develop instance - deploy vanilla Magento instance

❗ Automated tests can be triggered manually with an appropriate comment:

  • @magento run all tests - run or re-run all required tests against the PR changes
  • @magento run <test-build(s)> - run or re-run specific test build(s)
    For example: @magento run Unit Tests

<test-build(s)> is a comma-separated list of build names.

Allowed build names are:
  1. Database Compare
  2. Functional Tests CE
  3. Functional Tests EE
  4. Functional Tests B2B
  5. Integration Tests
  6. Magento Health Index
  7. Sample Data Tests CE
  8. Sample Data Tests EE
  9. Sample Data Tests B2B
  10. Static Tests
  11. Unit Tests
  12. WebAPI Tests
  13. Semantic Version Checker

You can find more information about the builds here
ℹ️ Run only required test builds during development. Run all test builds before sending your pull request for review.


For more details, review the Code Contributions documentation.
Join Magento Community Engineering Slack and ask your questions in #github channel.

@hostep
Copy link
Contributor Author

hostep commented Mar 27, 2024

@magento run all tests

@engcom-Hotel engcom-Hotel added the Priority: P1 Once P0 defects have been fixed, a defect having this priority is the next candidate for fixing. label Apr 2, 2024
@engcom-Bravo
Copy link
Contributor

Hi,

Internal team has started to work on it

Thanks.

@iphigenie
Copy link

just a friendly wave to say this works as a fix on 2.4.6-p5

@barryvdh
Copy link
Contributor

Thanks, this works for me too

@hostep
Copy link
Contributor Author

hostep commented Jun 5, 2024

Looks like this got merged in 2.4-develop today: 0574ac2, yet this PR is not marked as merged ...

@ihor-sviziev
Copy link
Contributor

The change was cherry-picked to 308e261 and merged to 2.4-develop branch, so we can close this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: P1 Once P0 defects have been fixed, a defect having this priority is the next candidate for fixing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Catalog search index process error indexation process
6 participants