Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure valid allowed_operations computation #6253

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

contificate
Copy link
Contributor

In an effort to make the logic clearer, we restructure the part(s) of Xapi_pool_helpers responsible for determining which operations are "valid" (i.e. become members of allowed_operations).

The previous code is somewhat tedious to understand because it is unconditionally ineffective - the logical delineation of parts of the code is implicit, as the code computes the valid operation table in order, but many of the operations will have no effect (as later code to populate an operation's entry in the validity table do nothing).

This change makes the precedence of blocking and waiting operations more apparent by explicitly casing on them (in decreasing order of precedence). The differences are somewhat subtle and explained in the commit message.


BVT+BST (211251) all green except for a few known issues. In a distant future, the relation between operations could be stated declaratively and the allowed_operations computed incrementally (based on ambient state changes in the database, etc.).

The code populates a table with a tri-state value (allowed, unknown, and disallowed). However, absence from the table could signify the same (that the operation is valid), as every possible value is accounted for (therefore, Unknown is redundant).

In an effort to make the logic clearer, we restructure the part(s) of
Xapi_pool_helpers responsible for determining which operations are
"valid" (i.e. become members of allowed_operations).

The previous code is somewhat tedious to understand because it is
unconditionally ineffective - the logical delineation of parts of the
code is implicit, as the code computes the valid operation table in
order, but many of the operations will have no effect (as later code to
populate an operation's entry in the validity table do nothing).

To try and simplify matters, we add some level of static partitioning of
"blocking" and "waiting" operations (using separate polymorphic variants
and coercions to widen into a type comprising all operations). Then, we
replace loops with "find first" computations.

The current logic should be clearer. It is roughly as follows:
To compute the "valid" operations:

- Start by assuming all operations are valid. We explicitly map each
  operation to an "Allowed" constructor to signify this. Though,
  technically, full coverage of the cases would guarantee that absence
  from this table implies the associated operation is valid - however,
  we maintain the old concern and maintain validity entries as a
  tri-state value (Unknown, Allowed, and Disallowed).

- Determine the current operations reported by the pool object. At
  present, every operation is statically partitioned into "blocking" or
  "waiting" operations - which are both handled differently, with a
  "blocking" operation taking highest precedence

- If there's an operation in current operations that is a blocking
  operation, this takes precedence. We cover all operation cases as
  follows: (1) blocking operations get marked as being invalid and cite
  the reason associated with the blocking operation discovered in the
  current operations set (which is an operation-specific "in progress"
  error). Then, all waiting operations get marked as invalid, citing a
  generic "in progress" error (unrelated to any specific operation).

- If there's no blocking operation in current operations, but there is a
  waiting operation, we map all operations to a generic "in progress"
  error.

- If there is no blocking or waiting operation in current operations
  (which, at present, is to say that current operations is empty), then
  we invalidate entries based on specific, hardcoded, invariants. For
  example, if HA is enabled on the pool, we invalidate the `ha_enable`
  operation on the pool object (with the reason explicitly explaining
  that you HA is already enabled).

In future, we could consider encoding the relations between operations
(and related object state) declaratively, such that we can either
automatically generate such code (and test it alongside invariants,
encoded as Prolog-esque rules) or use an incremental computation
framework to automatically recompute the allowed operations based on
changes in ambient pool state.

Signed-off-by: Colin James <colin.barr@cloud.com>
@contificate contificate marked this pull request as ready for review January 27, 2025 15:27
@@ -20,14 +20,38 @@ open Record_util

let finally = Xapi_stdext_pervasives.Pervasiveext.finally

type blocking_operations =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the relationship as explained in the existing comments not super obvious. What does it mean when an operation from one of these lists is in progress and a new operation is coming in - likewise from one of these lists. What happens to the incoming operation in all cases? Does "blocking" mean an operation blocks another one, or does it mean it is itself blocked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the terminology should probably be changed (as I'm unsure if the information is ever acted upon in any way that lines up with the naming).

It "blocking" and "waiting" distinction seems to only exist for changing what is the reported reason why an operation would fail. A blocking operation in current_operations blocks other operations of its own designation with a specific reason (the current - blocking - operation's "in progress" error). Whereas, a waiting operation in current_operations invalidates every operation with a generic reason.

To me, the current semantics are really just that all operations are invalid if there is any operation in current_operations (as there doesn't seem to exist an operation which isn't categorised as blocking or waiting). Then, the distinction exists to guide what reason should be reported for each operation. Blocking and waiting operations invalidate everything, so you only get precise errors for an attempted operation if there is no operation ongoing. However, nothing seems to operationally block or wait in this code, so it's not a real concern here (and the naming ought to line up with expectations).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distinction is how the incoming operation handles the fact that something else is in progress? The reaction does not depend on the operation in progress - is that how it works? So we are computing the reaction of an operation that finds an arbitrary operation in progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distinction is what the error message cites as the reason that an operation cannot be performed. The reaction depends on the current operation because, if there is a current operation that belongs to the blocking set, the error cites a specific "in progress" error for that operation. Whereas, all other responses - in the case of wait - are a generic error relating to the pool. The most specific errors arise when there's no current operations but the intended operation is illogical, like enabling HA if it's already enabled (the responses for blocking and waiting will overrule this though).

So, my understanding is: if HA is already enabled and there is a blocking operation (foo) within current operations, then the user tries to enable HA, it will error saying "foo in progress". Then, if foo completes and is removed from current operations (and there's nothing else in current operations), trying again will yield "HA is already enabled" error (so, a more precise error which is not overruled by a more generic reason). So the response is guided by what's already in current operations.

@lindig
Copy link
Contributor

lindig commented Jan 27, 2025

I very much like the look of this but the overall (existing) concept is not clear to me. Who is blocking (someone else) and who is blocked (by someone else)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants