Skip to content

Implement Occurs Check #7811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Implement Occurs Check #7811

wants to merge 3 commits into from

Conversation

jaredramirez
Copy link
Collaborator

@jaredramirez jaredramirez commented May 29, 2025

Overview

This PR implements the occurs check for type variables. This means, for a given type variable:

  • Resolve it to its type (ie follow redirects until we get a concrete type)
  • Recursively unwrap all subtype variables for the type (ie all tags in tag unions, record fields in records, etc)
  • Check if we run into a variable we've seen already before

This is the 1st step in the larger project of supporting recursive types in the type system, but only for nominal types – as outlined here. This check is not yet used anywhere, that will come in a follow-up PR.

Memory Management

During the occurs check, there are things we need to write down:

  • What variables we've seen so far (pushed & popped for each recursive branch)
  • All the variables we've visited and confirmed are not recursive (to avoid redundant work)
  • In the case of an error, what the chain of recursive types was (for error messages)

To capture this, we use a Scratch struct that allocates some array lists that can be re-used across many occurs check. This is the same pattern that unify uses to hold intermediate data.

Differences from the Rust Compiler

This implementation has not significant difference from the Rust compiler version. Originally, I did things slightly differently, but ultimately decided that it probably made the most sense to match the semantics of the Rust version. That said, if those reviewing like the original approach, we can easily revert.

Original "Differences from the Rust Compiler"

The Rust implementation of occurs takes a different approach to avoiding redundant work

  • It uses each variable’s Mark field to track visited state during the check
  • After the traversal, it resets all visited marks

This requires:

  • Mutably updating Mark values on each visited descriptor
  • Tracking visited variables separately in order to reset them

The Zig version does not use Mark at all. Instead:

  • It tracks visited variables explicitly in an array
  • Redundancy is avoided by checking this array during traversal

This results in the same number of allocations, but with two key differences:

  1. No need to mutate or reset Mark fields on descriptors
  2. Possibly slightly worse per-step performance due to linear O(n) check of the already visited array
    • Note that both the Zig and Rust versions do a linear scan of seen variables, though the total number of seen vars for any given recursive branch would be less than the total visited nodes.

That said, I'm unsure what the average number of variables in a type is, but this approach assumes it would be relatively small. (We could also use a Map to check for seen/visited vars, which may perform better than an array)

@jaredramirez jaredramirez self-assigned this May 29, 2025
@jaredramirez jaredramirez marked this pull request as ready for review May 29, 2025 23:53
@jaredramirez jaredramirez force-pushed the push-ovpsosyowlzp branch 9 times, most recently from 2d6d773 to c03d11e Compare June 1, 2025 21:04
@jaredramirez jaredramirez marked this pull request as draft June 1, 2025 22:59
This change switches back from the "new" way to keep track of visited
node in the occurs check to set Marks, like the rust compiler does.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant