Skip to content

Startup Retry & Configurable Startup Time-out & Error handling #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Apr 29, 2025

Conversation

zhiyuanliang-ms
Copy link
Contributor

@zhiyuanliang-ms zhiyuanliang-ms commented Feb 17, 2025

Why this PR?

  • Support startup retry and timeout.
  • Handle error in a more managed way
  • Added boot loop protection ref

The retry/backoff behavior is the same as .NET provider.

Retriable error in .NET provider: ref

reference: #488

Usage

const config = await load(connectionString, { 
  startupOptions: { 
    timeoutInMs: 30_000
  }
});

By default the startup timeout will be 100 seconds.

The pseudo code of the whole startup retry + failover logic:

function startupWithRetry():
    while not timeout:
        try:
            result = loadWithFailover()
            return result  // Load successful; exit the retry loop
        except error:
            if isRetriable(error):
                backoffDuration = calculateBackoffDuration()
                sleep(backoffDuration)  // Wait for the backoff duration before retrying
            else:
                throw error  // Non-retriable error; rethrow immediately

function loadWithFailover():
    // Retrieve the fallback client list via DNS SRV records
    fallbackClients = getFallbackClientsFromDNS()
    
    // Attempt to perform the load operation using each fallback client sequentially
    for client in fallbackClients:
        try:
            result = client.load()
            return result  // Load successful; return result immediately
        except error:
            if not isFailoverable(error):
                // If the error is not failoverable, immediately rethrow the error
                throw error
            // If the error is failoverable, continue with the next client
    
    // If all fallback clients have failed, throw a retriable error to trigger another retry
    throw new Error("All fallback clients failed")

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 8 changed files in this pull request and generated no comments.

Files not reviewed (2)
  • src/ConfigurationClientWrapper.ts: Evaluated as low risk
  • src/load.ts: Evaluated as low risk

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 8 changed files in this pull request and generated no comments.

Files not reviewed (2)
  • src/ConfigurationClientWrapper.ts: Evaluated as low risk
  • src/load.ts: Evaluated as low risk
Comments suppressed due to low confidence (1)

src/failover.ts:35

  • The comment should be 'random value between [-1, 1) * JITTER_RATIO * calculatedBackoffDuration' for consistency.
// jitter: random value between [-1, 1) * jitterRatio * calculatedBackoffMs

@zhenlan
Copy link

zhenlan commented Feb 17, 2025

Is there a reason the retry can be disabled?

@zhiyuanliang-ms
Copy link
Contributor Author

Is there a reason the retry can be disabled?

If people want to throw immediatly when initial load fails, they can disable it. Setting a very short timeout is not feasible for this case. The retryEnabled is true by default. So, there is no behavior difference from other providers.

@zhenlan
Copy link

zhenlan commented Feb 18, 2025

If people want to...

Anyone asked for it? What's the scenario for it?

@zhiyuanliang-ms
Copy link
Contributor Author

zhiyuanliang-ms commented Feb 18, 2025

If people want to...

Anyone asked for it? What's the scenario for it?

@zhenlan No one asks for it. I personally want to disable the retry so that I can know whether there is something wrong immediately.
I have experienced this: I used a wrong connection string (with correct format but wrong content), the load call lasted for 100 sec before fail. Same problem exists in .NET provider.

Anyway, I am ok with removing the retryEnabled option. But I am curious, are you worrying about that even if the default behavior is retry-enabled , user will still choose to disable it and we don't want them to disable it in most cases? Otherwise, I don't see why we cannot keep it.

@zhenlan
Copy link

zhenlan commented Feb 18, 2025

No one asks for it. I personally want to disable the retry so that I can know whether there is something wrong immediately.

If the connection string is malformed, I hope it will throw so we can fail immediately. Otherwise, you can't really tell what's wrong. Set a shorter timeout if you don't want to wait for too long.

are you worrying about that even if the default behavior is retry-enabled , user will still choose to disable it and we don't want them to disable it in most cases? Otherwise, I don't see why we cannot keep it.

Configuration is in critical code path. Most applications can't start without configuration loaded properly. Transient errors can happen for any cloud solutions. We rely on the client retry (and failover etc.) to deliver customers high availability. Please don't design a "feature" so customers can shoot their own feet.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 16 changed files in this pull request and generated 2 comments.

Files not reviewed (10)
  • src/ConfigurationClientWrapper.ts: Evaluated as low risk
  • src/load.ts: Evaluated as low risk
  • test/failover.test.ts: Evaluated as low risk
  • src/keyvault/AzureKeyVaultKeyValueAdapter.ts: Evaluated as low risk
  • src/AzureAppConfigurationOptions.ts: Evaluated as low risk
  • src/refresh/RefreshTimer.ts: Evaluated as low risk
  • test/keyvault.test.ts: Evaluated as low risk
  • src/ConfigurationClientManager.ts: Evaluated as low risk
  • test/requestTracing.test.ts: Evaluated as low risk
  • test/clientOptions.test.ts: Evaluated as low risk
Comments suppressed due to low confidence (1)

src/AzureAppConfigurationImpl.ts:249

  • [nitpick] The error message 'Load operation timed out.' is unclear. Consider making it more descriptive, such as 'Loading configuration settings timed out.'
reject(new Error("Load operation timed out."));

@zhiyuanliang-ms
Copy link
Contributor Author

Will send out another pr to make all error messages const in a seperated file.

@zhiyuanliang-ms zhiyuanliang-ms merged commit 5c1f4a3 into main Apr 29, 2025
6 checks passed
@zhiyuanliang-ms zhiyuanliang-ms deleted the zhiyuanliang/startup-timeout branch April 29, 2025 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants