Restructure valid allowed_operations computation #6253

contificate · 2025-01-27T15:09:03Z

In an effort to make the logic clearer, we restructure the part(s) of Xapi_pool_helpers responsible for determining which operations are "valid" (i.e. become members of allowed_operations).

The previous code is somewhat tedious to understand because it is unconditionally ineffective - the logical delineation of parts of the code is implicit, as the code computes the valid operation table in order, but many of the operations will have no effect (as later code to populate an operation's entry in the validity table do nothing).

This change makes the precedence of blocking and waiting operations more apparent by explicitly casing on them (in decreasing order of precedence). The differences are somewhat subtle and explained in the commit message.

BVT+BST (211251) all green except for a few known issues. In a distant future, the relation between operations could be stated declaratively and the allowed_operations computed incrementally (based on ambient state changes in the database, etc.).

The code populates a table with a tri-state value (allowed, unknown, and disallowed). However, absence from the table could signify the same (that the operation is valid), as every possible value is accounted for (therefore, Unknown is redundant).

In an effort to make the logic clearer, we restructure the part(s) of Xapi_pool_helpers responsible for determining which operations are "valid" (i.e. become members of allowed_operations). The previous code is somewhat tedious to understand because it is unconditionally ineffective - the logical delineation of parts of the code is implicit, as the code computes the valid operation table in order, but many of the operations will have no effect (as later code to populate an operation's entry in the validity table do nothing). To try and simplify matters, we add some level of static partitioning of "blocking" and "waiting" operations (using separate polymorphic variants and coercions to widen into a type comprising all operations). Then, we replace loops with "find first" computations. The current logic should be clearer. It is roughly as follows: To compute the "valid" operations: - Start by assuming all operations are valid. We explicitly map each operation to an "Allowed" constructor to signify this. Though, technically, full coverage of the cases would guarantee that absence from this table implies the associated operation is valid - however, we maintain the old concern and maintain validity entries as a tri-state value (Unknown, Allowed, and Disallowed). - Determine the current operations reported by the pool object. At present, every operation is statically partitioned into "blocking" or "waiting" operations - which are both handled differently, with a "blocking" operation taking highest precedence - If there's an operation in current operations that is a blocking operation, this takes precedence. We cover all operation cases as follows: (1) blocking operations get marked as being invalid and cite the reason associated with the blocking operation discovered in the current operations set (which is an operation-specific "in progress" error). Then, all waiting operations get marked as invalid, citing a generic "in progress" error (unrelated to any specific operation). - If there's no blocking operation in current operations, but there is a waiting operation, we map all operations to a generic "in progress" error. - If there is no blocking or waiting operation in current operations (which, at present, is to say that current operations is empty), then we invalidate entries based on specific, hardcoded, invariants. For example, if HA is enabled on the pool, we invalidate the `ha_enable` operation on the pool object (with the reason explicitly explaining that you HA is already enabled). In future, we could consider encoding the relations between operations (and related object state) declaratively, such that we can either automatically generate such code (and test it alongside invariants, encoded as Prolog-esque rules) or use an incremental computation framework to automatically recompute the allowed operations based on changes in ambient pool state. Signed-off-by: Colin James <colin.barr@cloud.com>

lindig · 2025-01-27T15:41:25Z

ocaml/xapi/xapi_pool_helpers.ml

@@ -20,14 +20,38 @@ open Record_util

 let finally = Xapi_stdext_pervasives.Pervasiveext.finally

+type blocking_operations =


I find the relationship as explained in the existing comments not super obvious. What does it mean when an operation from one of these lists is in progress and a new operation is coming in - likewise from one of these lists. What happens to the incoming operation in all cases? Does "blocking" mean an operation blocks another one, or does it mean it is itself blocked?

Yeah, the terminology should probably be changed (as I'm unsure if the information is ever acted upon in any way that lines up with the naming).

It "blocking" and "waiting" distinction seems to only exist for changing what is the reported reason why an operation would fail. A blocking operation in current_operations blocks other operations of its own designation with a specific reason (the current - blocking - operation's "in progress" error). Whereas, a waiting operation in current_operations invalidates every operation with a generic reason.

To me, the current semantics are really just that all operations are invalid if there is any operation in current_operations (as there doesn't seem to exist an operation which isn't categorised as blocking or waiting). Then, the distinction exists to guide what reason should be reported for each operation. Blocking and waiting operations invalidate everything, so you only get precise errors for an attempted operation if there is no operation ongoing. However, nothing seems to operationally block or wait in this code, so it's not a real concern here (and the naming ought to line up with expectations).

The distinction is how the incoming operation handles the fact that something else is in progress? The reaction does not depend on the operation in progress - is that how it works? So we are computing the reaction of an operation that finds an arbitrary operation in progress.

The distinction is what the error message cites as the reason that an operation cannot be performed. The reaction depends on the current operation because, if there is a current operation that belongs to the blocking set, the error cites a specific "in progress" error for that operation. Whereas, all other responses - in the case of wait - are a generic error relating to the pool. The most specific errors arise when there's no current operations but the intended operation is illogical, like enabling HA if it's already enabled (the responses for blocking and waiting will overrule this though).

So, my understanding is: if HA is already enabled and there is a blocking operation (foo) within current operations, then the user tries to enable HA, it will error saying "foo in progress". Then, if foo completes and is removed from current operations (and there's nothing else in current operations), trying again will yield "HA is already enabled" error (so, a more precise error which is not overruled by a more generic reason). So the response is guided by what's already in current operations.

lindig · 2025-01-27T15:49:48Z

I very much like the look of this but the overall (existing) concept is not clear to me. Who is blocking (someone else) and who is blocked (by someone else)?

contificate marked this pull request as ready for review January 27, 2025 15:27

lindig reviewed Jan 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure valid allowed_operations computation #6253

Restructure valid allowed_operations computation #6253

contificate commented Jan 27, 2025

lindig Jan 27, 2025

contificate Jan 27, 2025

lindig Jan 27, 2025

contificate Jan 27, 2025

lindig commented Jan 27, 2025

		@@ -20,14 +20,38 @@ open Record_util

		let finally = Xapi_stdext_pervasives.Pervasiveext.finally

		type blocking_operations =

Restructure valid allowed_operations computation #6253

Are you sure you want to change the base?

Restructure valid allowed_operations computation #6253

Conversation

contificate commented Jan 27, 2025

lindig Jan 27, 2025

Choose a reason for hiding this comment

contificate Jan 27, 2025

Choose a reason for hiding this comment

lindig Jan 27, 2025

Choose a reason for hiding this comment

contificate Jan 27, 2025

Choose a reason for hiding this comment

lindig commented Jan 27, 2025