Skip to content

Add Garbage collection. #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 7, 2024
Merged

Add Garbage collection. #2

merged 9 commits into from
Mar 7, 2024

Conversation

viega
Copy link
Collaborator

@viega viega commented Mar 7, 2024

Currently doesn't handle cross-thread references, but there's some groundwork done toward that.

John Viega and others added 9 commits March 3, 2024 01:18
…done, and everything except I/O is using it. However, collection is not yet done, there is no interface for cross-thread references, and the con4m 'object' abstraction is not yet being used for the allocations.
collection algorithm. I have not done cross-thread references yet, and
may defer for a while, since con4m only needs to be single threaded
itself for the moment, and I'd like to get to other things like a
repl.

I need to implement a wrapper for object allocation, but will do that
when I'm more coherent. Also should redo existing data structure API
at some point.
Ported the C keyword argument implementation I wrote in 2001 for the project, to be able to handle more natural internal APIs.
…ollector (or stick with system malloc if not using con4m). Then I added constructors for the major con4m data types... hashes, lists, FIFO queues, stacks, rings, and a 'log ring' (one alloc instead of a ring of object pointers). The constructors can take keywords using C keyword parameter stuff I'd written in the past. Also changed the tracing to be a bit more structured, but still readable (I learned that CAS on arm needs to be 32-byte aligned for some unknown reason??). But all in all, I think the first pass on garbage collection is done (until we're ready to extend across threads.
@viega viega merged commit b8bcecc into main Mar 7, 2024
@viega viega deleted the jtv/gc branch March 7, 2024 14:41
This was referenced Jun 16, 2024
@viega viega mentioned this pull request Jul 2, 2024
viega added a commit that referenced this pull request Jul 5, 2024
1. The debug target does NOT turn on ASAN / UBSAN by default (the noise was slowing down debugging a lot). This *does* have our own internal memory checking enabled though.
2. There's a new 'cicd' target that additionally enables ASAN / UBSAN. It's accessible via `dev testbuild`

CI/CD config is set up to use #2.
viega added a commit that referenced this pull request Jul 5, 2024
1. The debug target does NOT turn on ASAN / UBSAN by default (the noise was slowing down debugging a lot). This *does* have our own internal memory checking enabled though.
2. There's a new 'cicd' target that additionally enables ASAN / UBSAN. It's accessible via `dev testbuild`

CI/CD config is set up to use #2.
viega added a commit that referenced this pull request Jul 5, 2024
1. The debug target does NOT turn on ASAN / UBSAN by default (the noise was slowing down debugging a lot). This *does* have our own internal memory checking enabled though.
2. There's a new 'cicd' target that additionally enables ASAN / UBSAN. It's accessible via `dev testbuild`

CI/CD config is set up to use #2.
@viega viega mentioned this pull request Jul 5, 2024
viega added a commit that referenced this pull request Jul 6, 2024
1. The debug target does NOT turn on ASAN / UBSAN by default (the noise was slowing down debugging a lot). This *does* have our own internal memory checking enabled though.
2. There's a new 'cicd' target. It's accessible via `dev testbuild`
3. There's a separate target for enabling ASAN/UBSAN. That's not guaranteed to give good results.

CI/CD config is set up to use #2.
viega added a commit that referenced this pull request Jul 6, 2024
Changed the meson build targets so that:

1. The debug target does NOT turn on ASAN / UBSAN by default (the noise was slowing down debugging a lot). This *does* have our own internal memory checking enabled though.
2. There's a new 'cicd' target. It's accessible via `dev testbuild`
3. There's a separate target for enabling ASAN/UBSAN. That's not guaranteed to give good results.

CI/CD config is set up to use #2.
ee7 added a commit that referenced this pull request Aug 12, 2024
Before this commit, compiling c4test with UBSan and then running it
may produce 500+ lines of output like:

    ../src/hatrack/hash/set.c:258:15: runtime error: null pointer passed as argument 1, which is declared to never be null
    /usr/include/stdlib.h:971:30: note: nonnull attribute specified here
        #0 0x5899b1111647 in hatrack_set_items_base libcon4m/src/hatrack/hash/set.c:258:9
        #1 0x5899b1111ea1 in hatrack_set_items_sort libcon4m/src/hatrack/hash/set.c:305:12
        #2 0x5899b0fa6772 in c4m_set_to_xlist libcon4m/src/adts/set.c:158:38
        #3 0x5899b11430a6 in process_branch_exit libcon4m/src/compiler/cfg.c:497:30
        #4 0x5899b113cd67 in cfg_process_node libcon4m/src/compiler/cfg.c:633:9
        #5 0x5899b113d1a0 in cfg_process_node libcon4m/src/compiler/cfg.c:644:9
        #6 0x5899b113c070 in cfg_process_node libcon4m/src/compiler/cfg.c:577:31
        #7 0x5899b113d62a in cfg_process_node libcon4m/src/compiler/cfg.c:656:9
        #8 0x5899b113c070 in cfg_process_node libcon4m/src/compiler/cfg.c:577:31
        #9 0x5899b113b154 in c4m_cfg_analyze libcon4m/src/compiler/cfg.c:744:5
        #10 0x5899b106c882 in c4m_check_pass libcon4m/src/compiler/check_pass.c:3320:13
        #11 0x5899b1016fc5 in c4m_compile_from_entry_point libcon4m/src/compiler/compile.c:467:5
        #12 0x5899b0f07197 in execute_test libcon4m/src/harness/con4m_base/run.c:11:11
        #13 0x5899b0f05674 in run_one_item libcon4m/src/harness/con4m_base/run.c:61:29
        #14 0x5899b0f0535a in c4m_run_expected_value_tests libcon4m/src/harness/con4m_base/run.c:152:18
        #15 0x5899b0f01cdb in main libcon4m/src/harness/con4m_base/test.c:44:5
        #16 0x7fd148d45e07 in __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
        #17 0x7fd148d45ecb in __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:360:3
        #18 0x5899b0ecf7b4 in _start (libcon4m/build_dev/c4test+0x14f7b4) (BuildId: eb75f65a7b383149176b9e55cbbfdfd116868794)

    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/hatrack/hash/set.c:258:15

This was because, on some machines, qsort specifies that the first
argument must not be null:

    /* Sort NMEMB elements of BASE, of SIZE bytes each,
       using COMPAR to perform the comparisons.  */
    extern void qsort (void *__base, size_t __nmemb, size_t __size,
              __compar_fn_t __compar) __nonnull ((1, 4));

and at one call site, qsort could be called with a null first argument.
Skip the call in that case.

Closes: #92
ee7 added a commit that referenced this pull request Aug 12, 2024
Before this commit, compiling c4test with UBSan and then running it
may produce 500+ lines of output like:

```
../src/hatrack/hash/set.c:258:15: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/stdlib.h:971:30: note: nonnull attribute specified here
    #0 0x5899b1111647 in hatrack_set_items_base libcon4m/src/hatrack/hash/set.c:258:9
    #1 0x5899b1111ea1 in hatrack_set_items_sort libcon4m/src/hatrack/hash/set.c:305:12
    #2 0x5899b0fa6772 in c4m_set_to_xlist libcon4m/src/adts/set.c:158:38
    #3 0x5899b11430a6 in process_branch_exit libcon4m/src/compiler/cfg.c:497:30
    #4 0x5899b113cd67 in cfg_process_node libcon4m/src/compiler/cfg.c:633:9
    #5 0x5899b113d1a0 in cfg_process_node libcon4m/src/compiler/cfg.c:644:9
    #6 0x5899b113c070 in cfg_process_node libcon4m/src/compiler/cfg.c:577:31
    #7 0x5899b113d62a in cfg_process_node libcon4m/src/compiler/cfg.c:656:9
    #8 0x5899b113c070 in cfg_process_node libcon4m/src/compiler/cfg.c:577:31
    #9 0x5899b113b154 in c4m_cfg_analyze libcon4m/src/compiler/cfg.c:744:5
    #10 0x5899b106c882 in c4m_check_pass libcon4m/src/compiler/check_pass.c:3320:13
    #11 0x5899b1016fc5 in c4m_compile_from_entry_point libcon4m/src/compiler/compile.c:467:5
    #12 0x5899b0f07197 in execute_test libcon4m/src/harness/con4m_base/run.c:11:11
    #13 0x5899b0f05674 in run_one_item libcon4m/src/harness/con4m_base/run.c:61:29
    #14 0x5899b0f0535a in c4m_run_expected_value_tests libcon4m/src/harness/con4m_base/run.c:152:18
    #15 0x5899b0f01cdb in main libcon4m/src/harness/con4m_base/test.c:44:5
    #16 0x7fd148d45e07 in __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #17 0x7fd148d45ecb in __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:360:3
    #18 0x5899b0ecf7b4 in _start (libcon4m/build_dev/c4test+0x14f7b4) (BuildId: eb75f65a7b383149176b9e55cbbfdfd116868794)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/hatrack/hash/set.c:258:15
```

This was because, on some machines, qsort specifies that the first
argument must not be null:

    /* Sort NMEMB elements of BASE, of SIZE bytes each,
       using COMPAR to perform the comparisons.  */
    extern void qsort (void *__base, size_t __nmemb, size_t __size,
              __compar_fn_t __compar) __nonnull ((1, 4));

and at one call site, qsort could be called with a null first argument.
Skip the call in that case.

Closes: #92
viega added a commit that referenced this pull request Aug 19, 2024
grammar.

Basically, it works by running all possible parses in parallel (the
Earley algorithm).

That means, if the grammar is ambiguous, multiple valid parse trees
can be produced.

Currently, the test file is running this sample grammar:

 Add    ⟶   Add ['+'|'-'] Mul
 Add    ⟶   Mul
 Mul    ⟶   Mul ['*'|'/'] Paren
 Mul    ⟶   Paren
 Paren  ⟶   '(' Add ')'
 Paren  ⟶   («Digit»)+

Which generates the following parse tree on the input string "1+(2*3-4321)":

  Add
  ├── Add
  │   └── Mul
  │       └── Paren
  │           └── («Digit»)+
  │               └── #1
  │                   └── 1 (tok #0; line #1; col #0)
  ├── + (tok #1; line #1; col #1)
  └── Mul
      └── Paren
          ├── ( (tok #2; line #1; col #2)
          ├── Add
          │   ├── Add
          │   │   └── Mul
          │   │       ├── Mul
          │   │       │   └── Paren
          │   │       │       └── («Digit»)+
          │   │       │           └── #1
          │   │       │               └── 2 (tok #3; line #1; col #3)
          │   │       ├── * (tok #4; line #1; col #4)
          │   │       └── Paren
          │   │           └── («Digit»)+
          │   │               └── #1
          │   │                   └── 3 (tok #5; line #1; col #5)
          │   ├── - (tok #6; line #1; col #6)
          │   └── Mul
          │       └── Paren
          │           └── («Digit»)+
          │               ├── #1
          │               │   └── 4 (tok #7; line #1; col #7)
          │               ├── #2
          │               │   └── 3 (tok #8; line #1; col #8)
          │               ├── #3
          │               │   └── 2 (tok #9; line #1; col #9)
          │               └── #4
          │                   └── 1 (tok #10; line #1; col #10)
          └── ) (tok #11; line #1; col #11)

(See the code in test_parsing() in test.c)

The debug output shows the intermediate state table; there's one state
per token. The parser can easily be run from any starting production,
and should make it pretty easy to give good error messages, compared
to anything else, and will easily handle partial parses (e.g.,
per-line).

There are also several built-in character classes:

- Digit (per above)
- Unicode Digit (anything beyond decimal digits unicode tags)
- IdStart
- IdContinue
- Con4m's internal IdStart / IdContinue (accepts a couple extra chars)
- Upper
- Lower
- Ascii-only Upper
- Ascii-only Lower
- Whitespace

This does handle grouping that you can (programatically)
apply regex operators to; for instance, you can see the '+' in the
example above. You can obviously also do your own character classes,
like the ['+'|'-'] above.

Currently, the grammar needs to be specified programatically.  I did
all this because I'm going to have 'getopt' specs generate proper
parsers that can handle ambiguity, so once I wrap a getopt module
around this, you will be able to use con4m to generate full
command-line parsers that are more capable than what we could do
before.

Also, at some point I'll use this to build a front-end for itself that
takes EBNF directly, and will then migrate the lexer and parser over
to it, but this is not a priority right now (the short-term goal is
just getopt).

I'm sure there are some bugs lurking somewhere, since I haven't thrown
a wide variety of grammars at it yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant