-
Notifications
You must be signed in to change notification settings - Fork 17
579 composing cancelled by caller pr #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
579 composing cancelled by caller pr #583
Conversation
closes Neptune-Crypto#579 Avoids calling expect() on a function that can fail when a job is cancelled. Changes: + prove_concensus_program() returns JobHandleError::JobCancelled instead of anyhow::bail!("job cancelled by caller") + create_block_transaction_from() returns an error when Transaction::merge_with() fails instead of calling expect() + check for JobCancelled error if composer_task fails.
adds two tests that cancel a job while composing Changes: + mine() fn accepts a flag to indicate if it should perform initial 60 second sleep or not. This enables it to be called in unit tests. + add mine_loop test job_cancel_msg_cancels_composing() + add mine_loop test msg_from_main_does_not_crash_composer()
This enables proving job-cancellation to work in test environment and cleans up the proving result error path. Changes: + ProverJob handles job cancellation in test environment + prove() and prove_concensus_program() now return a CreateProofError + remove derive(Serialize, Deserialize) from several error types because not all errors can be serialized which creates a tension between serialization and having a complete error source chain. + change RpcError to use String as error source for most variants. RpcError is really the only error-type that needs to be serialized and callers do not need the error source chain. + fix CreateProofError downcast when checking composer_task error result + add tests::shared::wait_until() fn for tests to wait on a condition + remove CreateProofError::Failed catch-all for anyhow and replace with specific error variants + add JobQueue::num_jobs() and num_queued_jobs() methods + simplify composer_task error handling in mine_loop
The two new mine-loop tests for cancellation were depending on starting with 0 jobs in the job-queue and no outside job-queue additions during the test. This is no longer a valid assumption when tests are run in parallel. The fix involves using a sleep instead of waiting for a specific number of jobs. It is less "clean" but it works for both sequential and parallel test mode.
Changes: + auto cancel job when JobHandle is dropped. fixes test case that continues composing job after test completes + impl Drop for JobHandle + JobHandle now impls Future, so it can be awaited directly. cancellation in prove_concensus_program() should cause the JobHandle to be dropped. not tested yet + refine JobHandle api. remove un-needed methods + job-cancellation works cleanly in prove_concensus_program() select!{} via tokio::pin + add tests cancel_async_job_in_select, cancel_sync_job_in_select and comment test worker + improve job-queue module doc
These are some changes to prepare for moving job_queue into its own crate, once we have a neptune-core workspace. src/job_queue/triton_vm is moved to src/triton_vm_job_queue. It has neptune-core specific logic, and thus belongs in neptune-core crate. Changes: + break job-queue into separate files for better organization + move src/job_queue/triton_vm to src/triton_vm_job_queue. + remove the LogWhenDropped newtype + refactor queue.rs and comment thoroughly. + add doc-comment with examples for JobHandle::cancel()
adds a test for stopping the job-queue and hopefully will make coveralls happier.
The stop_queue() test revealed a logic/flow problem when the process_job task receives a Stop message. The tokio::select!{} was in the wrong place. It was selecting between the stop message and the job_added message. Instead it needs to select between the running job and the stop message. This commit makes that change and now the stop_queue() test passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand every implication of all the changes, but this looks like an improvement (or more accurately, a bunch of improvements) to me. I've left some nits & minor suggestions inline.
It's weird that overall coverage decreased by a substantial amount. From what I can see, there are only few newly uncovered lines overall, and a decent number of those weren't touched in this PR. As far as I can tell, the tools we use for coverage reporting have not changed either. |
yes I considered filing an issue, or anyway was planning to mention it to you. I added two new tests to mine_loop.rs that increased coverage there by 15% or so. And also new tests in queue.rs that increased coverage. Many of the newly "uncovered" lines don't really make sense to me, such as |
addresses review comments for Neptune-Crypto#583. Changes: + derive strum::Display for JobCompletion + improve JobCompletion::Panicked variant comment + reflow JobHandle::cancel doc-comment example + emit tracing in JobHandle::cancel() + improve JobHandle::job_id() doc-comment + calc self.cancel() In JobHandle drop impl instead of cancel_tx. + improve doc-comments in JobId + improve module comment in job_queue/mod.rs + use doc-comments instead of "//" for private items in queue.rs + remove unnecessary let statement when spawining process_job task. + move stop-signal processing into its own fn so process_jobs select is more readable. + change order of assert_eq params
Addresses review comments for Neptune-Crypto#583: + use #[cfg(not(test))] instead of bool param to disable initial sleep in mining loop for unit tests + improve comment in mining loop when checking composer_task error + rename RpcError::Error to RpcError::WalletError + remove redundant copy_dir_recursive() in src/tests/shared.rs
addressed all review comments. thx @jan-ferdinand. CI is green except for coveralls which appears to be acting a bit funny. merging. |
addresses review comments for #583. Changes: + derive strum::Display for JobCompletion + improve JobCompletion::Panicked variant comment + reflow JobHandle::cancel doc-comment example + emit tracing in JobHandle::cancel() + improve JobHandle::job_id() doc-comment + calc self.cancel() In JobHandle drop impl instead of cancel_tx. + improve doc-comments in JobId + improve module comment in job_queue/mod.rs + use doc-comments instead of "//" for private items in queue.rs + remove unnecessary let statement when spawining process_job task. + move stop-signal processing into its own fn so process_jobs select is more readable. + change order of assert_eq params
This is another PR where one thing sort of led to another as I got deeper into it, so it is much larger than originally planned/intended. I've squashed the original work-in-progress commits into a set of 6 commits for the primary areas of change. Reviewers might want to check ce1961b which is small and should close #579. Be aware however that there are problems with the tests that are fixed in later commits.
fix: continue composing if job-cancelled error occurs
closes #579
Avoids calling expect() on a function that can fail when a job is
cancelled.
Changes:
instead of anyhow::bail!("job cancelled by caller")
test: add mine_loop tests for job cancellation
adds two tests that cancel a job while composing
Changes:
second sleep or not. This enables it to be called in unit tests.
refactor: proving cancellation and error handling
This enables proving job-cancellation to work in test environment and
cleans up the proving result error path.
Changes:
because not all errors can be serialized which creates a tension
between serialization and having a complete error source chain.
RpcError is really the only error-type that needs to be serialized
and callers do not need the error source chain.
specific error variants
test: fix new mine-loop tests, for parallel usage
The two new mine-loop tests for cancellation were depending on starting
with 0 jobs in the job-queue and no outside job-queue additions during
the test. This is no longer a valid assumption when tests are run in
parallel.
The fix involves using a sleep instead of waiting for a specific number
of jobs. It is less "clean" but it works for both sequential and
parallel test mode.
refactor: job_queue::JobHandle improvements
Changes:
fixes test case that continues composing job after test completes
prove_concensus_program() should cause the JobHandle to be dropped. not
tested yet
tokio::pin
test worker
refactor: split queue.rs and move triton_vm
These are some changes to prepare for moving job_queue into its own
crate, once we have a neptune-core workspace.
src/job_queue/triton_vm is moved to src/triton_vm_job_queue. It has
neptune-core specific logic, and thus belongs in neptune-core crate.
Changes: