Add Garbage collection. #2

viega · 2024-03-07T14:38:18Z

Currently doesn't handle cross-thread references, but there's some groundwork done toward that.

…done, and everything except I/O is using it. However, collection is not yet done, there is no interface for cross-thread references, and the con4m 'object' abstraction is not yet being used for the allocations.

…l module) and fix a small bug in wrapping

…e clear I guess

collection algorithm. I have not done cross-thread references yet, and may defer for a while, since con4m only needs to be single threaded itself for the moment, and I'd like to get to other things like a repl. I need to implement a wrapper for object allocation, but will do that when I'm more coherent. Also should redo existing data structure API at some point.

… nothing else should get directly included.

…sts' target.

Ported the C keyword argument implementation I wrote in 2001 for the project, to be able to handle more natural internal APIs.

…ollector (or stick with system malloc if not using con4m). Then I added constructors for the major con4m data types... hashes, lists, FIFO queues, stacks, rings, and a 'log ring' (one alloc instead of a ring of object pointers). The constructors can take keywords using C keyword parameter stuff I'd written in the past. Also changed the tracing to be a bit more structured, but still readable (I learned that CAS on arm needs to be 32-byte aligned for some unknown reason??). But all in all, I think the first pass on garbage collection is done (until we're ready to extend across threads.

1. The debug target does NOT turn on ASAN / UBSAN by default (the noise was slowing down debugging a lot). This *does* have our own internal memory checking enabled though. 2. There's a new 'cicd' target that additionally enables ASAN / UBSAN. It's accessible via `dev testbuild` CI/CD config is set up to use #2.

1. The debug target does NOT turn on ASAN / UBSAN by default (the noise was slowing down debugging a lot). This *does* have our own internal memory checking enabled though. 2. There's a new 'cicd' target. It's accessible via `dev testbuild` 3. There's a separate target for enabling ASAN/UBSAN. That's not guaranteed to give good results. CI/CD config is set up to use #2.

Changed the meson build targets so that: 1. The debug target does NOT turn on ASAN / UBSAN by default (the noise was slowing down debugging a lot). This *does* have our own internal memory checking enabled though. 2. There's a new 'cicd' target. It's accessible via `dev testbuild` 3. There's a separate target for enabling ASAN/UBSAN. That's not guaranteed to give good results. CI/CD config is set up to use #2.

Before this commit, compiling c4test with UBSan and then running it may produce 500+ lines of output like: ../src/hatrack/hash/set.c:258:15: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/stdlib.h:971:30: note: nonnull attribute specified here #0 0x5899b1111647 in hatrack_set_items_base libcon4m/src/hatrack/hash/set.c:258:9 #1 0x5899b1111ea1 in hatrack_set_items_sort libcon4m/src/hatrack/hash/set.c:305:12 #2 0x5899b0fa6772 in c4m_set_to_xlist libcon4m/src/adts/set.c:158:38 #3 0x5899b11430a6 in process_branch_exit libcon4m/src/compiler/cfg.c:497:30 #4 0x5899b113cd67 in cfg_process_node libcon4m/src/compiler/cfg.c:633:9 #5 0x5899b113d1a0 in cfg_process_node libcon4m/src/compiler/cfg.c:644:9 #6 0x5899b113c070 in cfg_process_node libcon4m/src/compiler/cfg.c:577:31 #7 0x5899b113d62a in cfg_process_node libcon4m/src/compiler/cfg.c:656:9 #8 0x5899b113c070 in cfg_process_node libcon4m/src/compiler/cfg.c:577:31 #9 0x5899b113b154 in c4m_cfg_analyze libcon4m/src/compiler/cfg.c:744:5 #10 0x5899b106c882 in c4m_check_pass libcon4m/src/compiler/check_pass.c:3320:13 #11 0x5899b1016fc5 in c4m_compile_from_entry_point libcon4m/src/compiler/compile.c:467:5 #12 0x5899b0f07197 in execute_test libcon4m/src/harness/con4m_base/run.c:11:11 #13 0x5899b0f05674 in run_one_item libcon4m/src/harness/con4m_base/run.c:61:29 #14 0x5899b0f0535a in c4m_run_expected_value_tests libcon4m/src/harness/con4m_base/run.c:152:18 #15 0x5899b0f01cdb in main libcon4m/src/harness/con4m_base/test.c:44:5 #16 0x7fd148d45e07 in __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #17 0x7fd148d45ecb in __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:360:3 #18 0x5899b0ecf7b4 in _start (libcon4m/build_dev/c4test+0x14f7b4) (BuildId: eb75f65a7b383149176b9e55cbbfdfd116868794) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/hatrack/hash/set.c:258:15 This was because, on some machines, qsort specifies that the first argument must not be null: /* Sort NMEMB elements of BASE, of SIZE bytes each, using COMPAR to perform the comparisons. */ extern void qsort (void *__base, size_t __nmemb, size_t __size, __compar_fn_t __compar) __nonnull ((1, 4)); and at one call site, qsort could be called with a null first argument. Skip the call in that case. Closes: #92

Before this commit, compiling c4test with UBSan and then running it may produce 500+ lines of output like: ``` ../src/hatrack/hash/set.c:258:15: runtime error: null pointer passed as argument 1, which is declared to never be null /usr/include/stdlib.h:971:30: note: nonnull attribute specified here #0 0x5899b1111647 in hatrack_set_items_base libcon4m/src/hatrack/hash/set.c:258:9 #1 0x5899b1111ea1 in hatrack_set_items_sort libcon4m/src/hatrack/hash/set.c:305:12 #2 0x5899b0fa6772 in c4m_set_to_xlist libcon4m/src/adts/set.c:158:38 #3 0x5899b11430a6 in process_branch_exit libcon4m/src/compiler/cfg.c:497:30 #4 0x5899b113cd67 in cfg_process_node libcon4m/src/compiler/cfg.c:633:9 #5 0x5899b113d1a0 in cfg_process_node libcon4m/src/compiler/cfg.c:644:9 #6 0x5899b113c070 in cfg_process_node libcon4m/src/compiler/cfg.c:577:31 #7 0x5899b113d62a in cfg_process_node libcon4m/src/compiler/cfg.c:656:9 #8 0x5899b113c070 in cfg_process_node libcon4m/src/compiler/cfg.c:577:31 #9 0x5899b113b154 in c4m_cfg_analyze libcon4m/src/compiler/cfg.c:744:5 #10 0x5899b106c882 in c4m_check_pass libcon4m/src/compiler/check_pass.c:3320:13 #11 0x5899b1016fc5 in c4m_compile_from_entry_point libcon4m/src/compiler/compile.c:467:5 #12 0x5899b0f07197 in execute_test libcon4m/src/harness/con4m_base/run.c:11:11 #13 0x5899b0f05674 in run_one_item libcon4m/src/harness/con4m_base/run.c:61:29 #14 0x5899b0f0535a in c4m_run_expected_value_tests libcon4m/src/harness/con4m_base/run.c:152:18 #15 0x5899b0f01cdb in main libcon4m/src/harness/con4m_base/test.c:44:5 #16 0x7fd148d45e07 in __libc_start_call_main /usr/src/debug/glibc/glibc/csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #17 0x7fd148d45ecb in __libc_start_main /usr/src/debug/glibc/glibc/csu/../csu/libc-start.c:360:3 #18 0x5899b0ecf7b4 in _start (libcon4m/build_dev/c4test+0x14f7b4) (BuildId: eb75f65a7b383149176b9e55cbbfdfd116868794) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/hatrack/hash/set.c:258:15 ``` This was because, on some machines, qsort specifies that the first argument must not be null: /* Sort NMEMB elements of BASE, of SIZE bytes each, using COMPAR to perform the comparisons. */ extern void qsort (void *__base, size_t __nmemb, size_t __size, __compar_fn_t __compar) __nonnull ((1, 4)); and at one call site, qsort could be called with a null first argument. Skip the call in that case. Closes: #92

grammar. Basically, it works by running all possible parses in parallel (the Earley algorithm). That means, if the grammar is ambiguous, multiple valid parse trees can be produced. Currently, the test file is running this sample grammar: Add ⟶ Add ['+'|'-'] Mul Add ⟶ Mul Mul ⟶ Mul ['*'|'/'] Paren Mul ⟶ Paren Paren ⟶ '(' Add ')' Paren ⟶ («Digit»)+ Which generates the following parse tree on the input string "1+(2*3-4321)": Add ├── Add │ └── Mul │ └── Paren │ └── («Digit»)+ │ └── #1 │ └── 1 (tok #0; line #1; col #0) ├── + (tok #1; line #1; col #1) └── Mul └── Paren ├── ( (tok #2; line #1; col #2) ├── Add │ ├── Add │ │ └── Mul │ │ ├── Mul │ │ │ └── Paren │ │ │ └── («Digit»)+ │ │ │ └── #1 │ │ │ └── 2 (tok #3; line #1; col #3) │ │ ├── * (tok #4; line #1; col #4) │ │ └── Paren │ │ └── («Digit»)+ │ │ └── #1 │ │ └── 3 (tok #5; line #1; col #5) │ ├── - (tok #6; line #1; col #6) │ └── Mul │ └── Paren │ └── («Digit»)+ │ ├── #1 │ │ └── 4 (tok #7; line #1; col #7) │ ├── #2 │ │ └── 3 (tok #8; line #1; col #8) │ ├── #3 │ │ └── 2 (tok #9; line #1; col #9) │ └── #4 │ └── 1 (tok #10; line #1; col #10) └── ) (tok #11; line #1; col #11) (See the code in test_parsing() in test.c) The debug output shows the intermediate state table; there's one state per token. The parser can easily be run from any starting production, and should make it pretty easy to give good error messages, compared to anything else, and will easily handle partial parses (e.g., per-line). There are also several built-in character classes: - Digit (per above) - Unicode Digit (anything beyond decimal digits unicode tags) - IdStart - IdContinue - Con4m's internal IdStart / IdContinue (accepts a couple extra chars) - Upper - Lower - Ascii-only Upper - Ascii-only Lower - Whitespace This does handle grouping that you can (programatically) apply regex operators to; for instance, you can see the '+' in the example above. You can obviously also do your own character classes, like the ['+'|'-'] above. Currently, the grammar needs to be specified programatically. I did all this because I'm going to have 'getopt' specs generate proper parsers that can handle ambiguity, so once I wrap a getopt module around this, you will be able to use con4m to generate full command-line parsers that are more capable than what we could do before. Also, at some point I'll use this to build a front-end for itself that takes EBNF directly, and will then migrate the lexer and parser over to it, but this is not a priority right now (the short-term goal is just getopt). I'm sure there are some bugs lurking somewhere, since I haven't thrown a wide variety of grammars at it yet.

John Viega and others added 9 commits March 3, 2024 01:18

Some initial shape for garbage collection. The initial allocation is …

fb1bc62

…done, and everything except I/O is using it. However, collection is not yet done, there is no interface for cross-thread references, and the con4m 'object' abstraction is not yet being used for the allocations.

Change test to use terminal width (adding the beginnings of a termina…

ff22d42

…l module) and fix a small bug in wrapping

Rename heap to arena since could be used outside GC contex and is mor…

3997ee0

…e clear I guess

All header inclusion for src/con4m now happens via including con4m.h;…

ce8bd6f

… nothing else should get directly included.

Improved the basic build wrapper; renamed to 'dev' and gave it as 'te…

aeb992d

…sts' target.

Keyword arguments

49c1667

Ported the C keyword argument implementation I wrote in 2001 for the project, to be able to handle more natural internal APIs.

Added initial constructor support, implementing for u8 and u32 strings.

6bd2458

viega merged commit b8bcecc into main Mar 7, 2024

viega deleted the jtv/gc branch March 7, 2024 14:41

This was referenced Jun 16, 2024

GC changes #32

Closed

GC Tweaks #33

Merged

viega mentioned this pull request Jul 2, 2024

Reworking of lists, etc #60

Merged

viega mentioned this pull request Jul 5, 2024

Change meson target configs #81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Garbage collection. #2

Add Garbage collection. #2

Uh oh!

viega commented Mar 7, 2024

Uh oh!

Uh oh!

Add Garbage collection. #2

Add Garbage collection. #2

Uh oh!

Conversation

viega commented Mar 7, 2024

Uh oh!

Uh oh!