Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retryable transactions + async exception handling #1482

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

jship
Copy link

@jship jship commented Mar 14, 2023

The main change in this PR is adding support for retryable transactions. For example, when using the various runSqlPool* functions at a repeatable-read or serializable isolation level, application authors need a means to retry transactions that encounter serialization failures. Without official support for this in persistent, runSqlPool* users could technically still manually catch a serialization failure exception around the whole runSqlPool* and retry the transaction, but this is a nonstarter as the connection would need to be returned then reacquired from the pool. To support retryable transactions, a runSqlPoolWithExtensibleHooksRetry function was added that takes in an exception predicate to determine retrying the transaction. The existing runSqlPoolWithExtensibleHooks is now implemented in terms of the new function.

Another change in this PR is around runSqlPoolWithExtensibleHooks's async exception handling. The previous version of this function would not run the runOnException hook when the user-specified database action was aborted via async exception, as async exceptions were ignored entirely due to use of unliftio's catchAny. This is not as problematic as it sounds on the surface: if an async exception came in, the enclosing withResource on the pool would catch it and terminate the connection. For PostgreSQL, when the connection is terminated, the database discards whatever changes were made in the transaction even though there was no explicit rollback. With the change in this PR, if users have custom logic defined in their runOnException hook on top of just rolling back the transaction, they should now be able to rely on persistent to execute this hook when the user-specified database action encounters any type of exception.

There were also multiple spots where the masking state was being restored (basically everything was in a restore except for the installation of the catchAny handler). The masking has been changed in this PR such that alterBackend is still in a restore, as is the user-specified action, but runBefore and runAfter are no longer in a restore. The previous version's restore usage came in from #1207, so I verified that the conn-killed binary still produces Right with the new masking. Additionally, it's worth noting that runOnException is now implicitly in an uninterruptibleMask (via unliftio's withException).

The PR might be easiest to review commit-by-commit, as intentionally failing tests were added prior to changes being made to the libraries.


Before submitting your PR, check that you've:

  • Documented new APIs with Haddock markup
  • Added @since declarations to the Haddock
  • Ran stylish-haskell on any changed files.
  • Adhered to the code style (see the .editorconfig file for details)

After submitting your PR:

  • Update the Changelog.md file with a link to your PR
  • Bumped the version number if there isn't an (unreleased) on the Changelog
  • Check that CI passes (or if it fails, for reasons unrelated to your change, like CI timeouts)

-- @since 2.14.6.0
runSqlPoolWithExtensibleHooksRetry
:: forall backend m a. (MonadUnliftIO m, BackendCompatible SqlBackend backend)
=> (UE.SomeException -> Bool)
Copy link
Author

@jship jship Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth pointing out that the API has been intentionally kept simple here: so long as synchronous exceptions match the predicate, the transaction will always be retried. In the case of serialization failures at repeatable-read and serializable isolation specifically, the recommendation from the PostgreSQL docs and elsewhere is to retry these failing transactions unconditionally.

That guidance is what drove this simpler API, but users may have other failures (e.g. uniqueness violations) they would like to only retry up to a fixed number of times or retry based on some more sophisticated policy. The simple exception predicate approach in this PR would not work for those more complicated cases. Considering the more complicated cases are rare, it seemed prudent to keep the retry API as simple as possible for now. However, depending upon the exception predicate the user specifies, they may get themselves into a situation where persistent indefinitely retries. For these cases, they could wrap their runSqlPoolWithExtensibleHooksRetry call in a timeout.

@jship
Copy link
Author

jship commented Jun 9, 2023

We've had success running these changes in production for a few months now. Is there anything I can help with in regards to moving the PR along?

@parsonsmatt parsonsmatt self-requested a review June 9, 2023 22:21
@jship jship force-pushed the retryable-transactions-and-async-exception-stuff branch from 91b68a4 to 0c165fd Compare July 6, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant