You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make large, recursive schemas diff-able by deferring computation of diffs (#249)
* Demonstrate performance problems with diff by diffing a large and interconnected schema.
This new test generates a large, interconnected api schema with 200 endpoints,
250 model schemas, and references between the models. This generated schema is similar to a real-world
schema that we use in production at my job, that failed to diff because
it never completed its diff computation.
* Conversion to DeferredChanged
My employer has an API schema with a lot of deeply nested and
recursively referenced objects. We wanted to validate that changes made
by developers are backwards compatible. Unfortunately the OpenAPI-Diff
tool would run practically forever. There were too many situations
where it would have to recompute a diff and could not use the cached
result.
I implemented an approach that defers computing schema diffs until
the entire structure of the API schema has been parsed. This prevents
recursive schema definitions from being computed over and over again
during parsing. It also ensures that diffs are only computed exactly
one time, not recomputed.
All this reduces the computational complexity of parsing this big,
recursive schema to a manageable time, and avoids recomputing diffs.
== Test case
I have created a test case: `LargeSchemaTest` that generates a schema
similar to the one my employer uses. (Unfortunately our schema is for
an internal system and I can't share it.)
It will generate similar, but incompatible schemas. These schemas each
have:
- 250 schemas defined in #/components/schemas, each with 5 properties
recursively referencing other schemas defined in #/components schemas.
- 100 api endpoints that use those schemas in the RequestBody or
ResponseBody.
When this test on the `master` branch openapi-diff code, it will not
complete. When you profile, you will find that the time is spent
in `Changed.isChanged()` which recursively calls other instances of
`Changed`. The deep recursion causes an exponential explosion of the
number of calls required to compute changed for the whole model.
== The solution: Deferring computation of diffs
The solution is to break the diff into a two step process:
- Step 1: Read the schema and align all the diff computations, deferring
computation of actual differences, and avoiding recursive differences.
- Step 2: Compute all the differences, avoiding recomputing the
recursive differences.
This is implemented in [OpenApiDiff.compare()](core/src/main/java/org/openapitools/openapidiff/core/compare/OpenApiDiff.java#L89-L127)]
Implementing this was a relatively small change to the code.
The `DeferredSchemaCache` holds the cache of SchemaDiffs. It is able to
distinguish multiple requests for the same differences. This is the
key to avoiding recomputing the same difference multiple times.
I replaced all the `Optional<?> diff(...)` with `DeferredChanged<?> diff(...)`,
and chose an interface for `DeferredChanged` that matched the `Optional`
interface. This minimized the lines of code changed, making it easier
to review.
Finally, I created a helper object called `DeferredBuilder` which
simplifies the task of collecting a bunch of `DeferedChanged` instances
together to make composing a change easier to program and read.
0 commit comments