579 composing cancelled by caller pr #583

dan-da · 2025-05-07T23:48:00Z

This is another PR where one thing sort of led to another as I got deeper into it, so it is much larger than originally planned/intended. I've squashed the original work-in-progress commits into a set of 6 commits for the primary areas of change. Reviewers might want to check ce1961b which is small and should close #579. Be aware however that there are problems with the tests that are fixed in later commits.

fix: continue composing if job-cancelled error occurs

closes #579

Avoids calling expect() on a function that can fail when a job is
cancelled.

Changes:

prove_concensus_program() returns JobHandleError::JobCancelled
instead of anyhow::bail!("job cancelled by caller")
create_block_transaction_from() returns an error when
add mine_loop test job_cancel_msg_cancels_composing()
add mine_loop test msg_from_main_does_not_crash_composer()

test: add mine_loop tests for job cancellation

adds two tests that cancel a job while composing

Changes:

mine() fn accepts a flag to indicate if it should perform initial 60
second sleep or not. This enables it to be called in unit tests.
add mine_loop test job_cancel_msg_cancels_composing()
add mine_loop test msg_from_main_does_not_crash_composer()

refactor: proving cancellation and error handling

This enables proving job-cancellation to work in test environment and
cleans up the proving result error path.

Changes:

ProverJob handles job cancellation in test environment
prove() and prove_concensus_program() now return a CreateProofError
remove derive(Serialize, Deserialize) from several error types
because not all errors can be serialized which creates a tension
between serialization and having a complete error source chain.
change RpcError to use String as error source for most variants.
RpcError is really the only error-type that needs to be serialized
and callers do not need the error source chain.
fix CreateProofError downcast when checking composer_task error result
add tests::shared::wait_until() fn for tests to wait on a condition
remove CreateProofError::Failed catch-all for anyhow and replace with
specific error variants
add JobQueue::num_jobs() and num_queued_jobs() methods
simplify composer_task error handling in mine_loop

test: fix new mine-loop tests, for parallel usage

The two new mine-loop tests for cancellation were depending on starting
with 0 jobs in the job-queue and no outside job-queue additions during
the test. This is no longer a valid assumption when tests are run in
parallel.

The fix involves using a sleep instead of waiting for a specific number
of jobs. It is less "clean" but it works for both sequential and
parallel test mode.

refactor: job_queue::JobHandle improvements

Changes:

auto cancel job when JobHandle is dropped.
fixes test case that continues composing job after test completes
impl Drop for JobHandle
JobHandle now impls Future, so it can be awaited directly. cancellation in
prove_concensus_program() should cause the JobHandle to be dropped. not
tested yet
refine JobHandle api. remove un-needed methods
job-cancellation works cleanly in prove_concensus_program() select!{} via
tokio::pin
add tests cancel_async_job_in_select, cancel_sync_job_in_select and comment
test worker
improve job-queue module doc

refactor: split queue.rs and move triton_vm

These are some changes to prepare for moving job_queue into its own
crate, once we have a neptune-core workspace.

src/job_queue/triton_vm is moved to src/triton_vm_job_queue. It has
neptune-core specific logic, and thus belongs in neptune-core crate.

Changes:

break job-queue into separate files for better organization
move src/job_queue/triton_vm to src/triton_vm_job_queue.
remove the LogWhenDropped newtype
refactor queue.rs and comment thoroughly.
add doc-comment with examples for JobHandle::cancel()

closes Neptune-Crypto#579 Avoids calling expect() on a function that can fail when a job is cancelled. Changes: + prove_concensus_program() returns JobHandleError::JobCancelled instead of anyhow::bail!("job cancelled by caller") + create_block_transaction_from() returns an error when Transaction::merge_with() fails instead of calling expect() + check for JobCancelled error if composer_task fails.

adds two tests that cancel a job while composing Changes: + mine() fn accepts a flag to indicate if it should perform initial 60 second sleep or not. This enables it to be called in unit tests. + add mine_loop test job_cancel_msg_cancels_composing() + add mine_loop test msg_from_main_does_not_crash_composer()

This enables proving job-cancellation to work in test environment and cleans up the proving result error path. Changes: + ProverJob handles job cancellation in test environment + prove() and prove_concensus_program() now return a CreateProofError + remove derive(Serialize, Deserialize) from several error types because not all errors can be serialized which creates a tension between serialization and having a complete error source chain. + change RpcError to use String as error source for most variants. RpcError is really the only error-type that needs to be serialized and callers do not need the error source chain. + fix CreateProofError downcast when checking composer_task error result + add tests::shared::wait_until() fn for tests to wait on a condition + remove CreateProofError::Failed catch-all for anyhow and replace with specific error variants + add JobQueue::num_jobs() and num_queued_jobs() methods + simplify composer_task error handling in mine_loop

The two new mine-loop tests for cancellation were depending on starting with 0 jobs in the job-queue and no outside job-queue additions during the test. This is no longer a valid assumption when tests are run in parallel. The fix involves using a sleep instead of waiting for a specific number of jobs. It is less "clean" but it works for both sequential and parallel test mode.

Changes: + auto cancel job when JobHandle is dropped. fixes test case that continues composing job after test completes + impl Drop for JobHandle + JobHandle now impls Future, so it can be awaited directly. cancellation in prove_concensus_program() should cause the JobHandle to be dropped. not tested yet + refine JobHandle api. remove un-needed methods + job-cancellation works cleanly in prove_concensus_program() select!{} via tokio::pin + add tests cancel_async_job_in_select, cancel_sync_job_in_select and comment test worker + improve job-queue module doc

These are some changes to prepare for moving job_queue into its own crate, once we have a neptune-core workspace. src/job_queue/triton_vm is moved to src/triton_vm_job_queue. It has neptune-core specific logic, and thus belongs in neptune-core crate. Changes: + break job-queue into separate files for better organization + move src/job_queue/triton_vm to src/triton_vm_job_queue. + remove the LogWhenDropped newtype + refactor queue.rs and comment thoroughly. + add doc-comment with examples for JobHandle::cancel()

adds a test for stopping the job-queue and hopefully will make coveralls happier.

The stop_queue() test revealed a logic/flow problem when the process_job task receives a Stop message. The tokio::select!{} was in the wrong place. It was selecting between the stop message and the job_added message. Instead it needs to select between the running job and the stop message. This commit makes that change and now the stop_queue() test passes.

jan-ferdinand

I don't think I understand every implication of all the changes, but this looks like an improvement (or more accurately, a bunch of improvements) to me. I've left some nits & minor suggestions inline.

src/job_queue/job_completion.rs

src/job_queue/job_handle.rs

src/mine_loop.rs

src/rpc_server.rs

src/tests/shared.rs

jan-ferdinand · 2025-05-08T13:06:44Z

It's weird that overall coverage decreased by a substantial amount. From what I can see, there are only few newly uncovered lines overall, and a decent number of those weren't touched in this PR. As far as I can tell, the tools we use for coverage reporting have not changed either.

dan-da · 2025-05-08T17:20:20Z

It's weird that overall coverage decreased by a substantial amount. From what I can see, there are only few newly uncovered lines overall, and a decent number of those weren't touched in this PR. As far as I can tell, the tools we use for coverage reporting have not changed either.

yes I considered filing an issue, or anyway was planning to mention it to you.

I added two new tests to mine_loop.rs that increased coverage there by 15% or so. And also new tests in queue.rs that increased coverage.

Many of the newly "uncovered" lines don't really make sense to me, such as tracing::debug() calls that are surrounded by otherwise covered code.

addresses review comments for Neptune-Crypto#583. Changes: + derive strum::Display for JobCompletion + improve JobCompletion::Panicked variant comment + reflow JobHandle::cancel doc-comment example + emit tracing in JobHandle::cancel() + improve JobHandle::job_id() doc-comment + calc self.cancel() In JobHandle drop impl instead of cancel_tx. + improve doc-comments in JobId + improve module comment in job_queue/mod.rs + use doc-comments instead of "//" for private items in queue.rs + remove unnecessary let statement when spawining process_job task. + move stop-signal processing into its own fn so process_jobs select is more readable. + change order of assert_eq params

Addresses review comments for Neptune-Crypto#583: + use #[cfg(not(test))] instead of bool param to disable initial sleep in mining loop for unit tests + improve comment in mining loop when checking composer_task error + rename RpcError::Error to RpcError::WalletError + remove redundant copy_dir_recursive() in src/tests/shared.rs

dan-da · 2025-05-09T01:23:53Z

addressed all review comments. thx @jan-ferdinand. CI is green except for coveralls which appears to be acting a bit funny. merging.

addresses review comments for #583. Changes: + derive strum::Display for JobCompletion + improve JobCompletion::Panicked variant comment + reflow JobHandle::cancel doc-comment example + emit tracing in JobHandle::cancel() + improve JobHandle::job_id() doc-comment + calc self.cancel() In JobHandle drop impl instead of cancel_tx. + improve doc-comments in JobId + improve module comment in job_queue/mod.rs + use doc-comments instead of "//" for private items in queue.rs + remove unnecessary let statement when spawining process_job task. + move stop-signal processing into its own fn so process_jobs select is more readable. + change order of assert_eq params

dan-da added 9 commits May 7, 2025 16:21

chore: clippy lint in copy_dir_recursive

1cf6d15

test: add test job_queue::tests::stop_queue()

d1ca286

adds a test for stopping the job-queue and hopefully will make coveralls happier.

dan-da requested review from aszepieniec and jan-ferdinand May 8, 2025 07:05

jan-ferdinand approved these changes May 8, 2025

View reviewed changes

dan-da added 2 commits May 8, 2025 16:05

dan-da merged commit 13d8570 into Neptune-Crypto:master May 9, 2025
12 of 13 checks passed

This was referenced May 11, 2025

panic while composing. #489

Closed

peer_loop.rs:1745 #589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

579 composing cancelled by caller pr #583

579 composing cancelled by caller pr #583

Uh oh!

dan-da commented May 7, 2025 •

edited

Loading

Uh oh!

jan-ferdinand left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jan-ferdinand commented May 8, 2025

Uh oh!

dan-da commented May 8, 2025

Uh oh!

dan-da commented May 9, 2025

Uh oh!

Uh oh!

Uh oh!

579 composing cancelled by caller pr #583

579 composing cancelled by caller pr #583

Uh oh!

Conversation

dan-da commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jan-ferdinand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jan-ferdinand commented May 8, 2025

Uh oh!

dan-da commented May 8, 2025

Uh oh!

dan-da commented May 9, 2025

Uh oh!

Uh oh!

Uh oh!

dan-da commented May 7, 2025 •

edited

Loading