Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State Build implementation for File Copy based replication #2954

Merged
merged 8 commits into from
Dec 23, 2024

Conversation

piyujai
Copy link
Contributor

@piyujai piyujai commented Nov 27, 2024

This PR handles initializing BlobStore with Log, Index, BlobStoreCompactor, RemoteTokenTracker, etc post filecopy is completed in the filecopy based replication.

Bootstrap State Build Design Doc

Key Changes:

  1. buildStateForFileCopy: New method to build in-memory state when a file copy completes under file copy based replication. It throws an exception if ReplicaId is null and adjusts disk space on failure.
  2. addBlobStoreForFileCopy: Added to DiskManager and StorageManager for adding BlobStores during file copy replication. It includes initializing of BlobStore with Log, Index, BlobStoreCompactor and RemoteTokenTracker, etc. It includes error handling for failed store creation.
  3. Error Handling: Enhanced logging and handling for failed store additions and bootstrap replica issues.

Test Coverage:

  1. Added tests for various file copy scenarios, including successful and failed store additions and handling null ReplicaId, successful state build post blob writes to a new store, etc.
  2. Testing with sealed log files with setting of active segments and rollover of files is working correctly in local testing.

@piyujai piyujai force-pushed the bootstrap_state_build_changes branch 2 times, most recently from 80d1bce to 1124d46 Compare November 28, 2024 11:08
@piyujai piyujai force-pushed the bootstrap_state_build branch from 0948730 to ef290ec Compare November 29, 2024 06:39
@piyujai piyujai force-pushed the bootstrap_state_build_changes branch from 6e10bcf to 944356e Compare November 29, 2024 06:43
@piyujai piyujai force-pushed the bootstrap_state_build branch from ef290ec to 350dd47 Compare November 29, 2024 06:48
@piyujai piyujai marked this pull request as draft November 29, 2024 06:49
@piyujai piyujai force-pushed the bootstrap_state_build branch 2 times, most recently from b608e52 to 2e147ea Compare December 16, 2024 13:24
@piyujai piyujai marked this pull request as ready for review December 16, 2024 18:26
@piyujai piyujai changed the title WIP: Changes for state build State Build implementation for File Copy based replication Dec 16, 2024
@piyujai piyujai force-pushed the bootstrap_state_build branch 3 times, most recently from 4f7c463 to 664acbe Compare December 16, 2024 21:33
@piyujai piyujai changed the base branch from bootstrap_state_build_changes to master December 17, 2024 06:09
@piyujai piyujai changed the base branch from master to bootstrap-optimisation-lld-flow December 17, 2024 06:09
// create a bootstrap-in-progress file to distinguish it from regular stores (the file will be checked during
// BOOTSTRAP -> STANDBY transition)
createBootstrapFileIfAbsent(replica);
logger.info("New store is successfully added into DiskManager.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should log partitionId in this log.

succeed = true;
}
} catch (Exception e) {
logger.error("Failed to start new added store {} or add requirements to disk allocator", replica.getPartitionId(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log statement needs to be tweaked as we are not making a call to diskSpaceAllocator now.

Store store = dm.getStore(newPartition, false);
MockId id1 = addRandomBlobToStore(store, 100, Utils.Infinite_Time);
MockId id2 = addRandomBlobToStore(store, 200, Utils.Infinite_Time);
// Create storage manager again to create disk managers again
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, why is shutting down the storageManager and re-initiating it, relevant / necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to shut it down to ensure that BlobStores are shut down and state build is able to start the BlobStores again. This is to test the flow where existing blobs are written to disk and then building state is attempted by adding a store via stateBuild flow.

* @throws Exception
*/
@Test
public void buildStateForFileCopyFailureToAddBlobStoreTest() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename: AddingDuplicateBlobStoreShouldFailTest*

*/
@Override
public boolean addBlobStoreForFileCopy(ReplicaId replica) {
if (partitionToDiskManager.containsKey(replica.getPartitionId())) {
Copy link
Contributor

@aga9900 aga9900 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get added as part of pre-file store, so it might occur that the this always returns true. We need to check for the existence of store here explicitely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should instead check below map for existence of store in disk manager.
private final ConcurrentHashMap<PartitionId, BlobStore> stores = new ConcurrentHashMap<>();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, diskmanager will be started already in pre-filecopy steps. Instead, I have put the check of partitionToDiskManager not having the partitionId key and fetching the diskManager from this map itself. Checking of store's existence is handled in the addBlobStoreForFileCopy call.

@piyujai piyujai force-pushed the bootstrap_state_build branch from 664acbe to 76073d6 Compare December 20, 2024 09:56
@piyujai piyujai changed the base branch from bootstrap-optimisation-lld-flow to master December 20, 2024 09:56
logger.error("Failed to add new store into DiskManager");
return false;
}
partitionToDiskManager.put(replica.getPartitionId(), diskManager);
Copy link
Contributor

@aga9900 aga9900 Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be present when we add the Blob Store after file copy.
Steps happening before this:

  1. Addition of Replica object.
  2. File Copy Trigger -> This will add FileStore in StorageManager which goes through DiskManager. This step also should create an entry in partitionToDiskManager.

@piyujai piyujai force-pushed the bootstrap_state_build branch 3 times, most recently from 82b5da6 to 1b97cc7 Compare December 20, 2024 23:14
compactionManager.addBlobStore(store);
// add new created store into in-memory data structures.
stores.put(replica.getPartitionId(), store);
partitionToReplicaMap.put(replica.getPartitionId(), replica);
Copy link
Contributor

@aga9900 aga9900 Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the name suggest, we should be adding an entry to this map much sooner than after the file copy finishes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, we should handle it in prefilecopy. Please make a note.

logger.error("Failed to add new store into DiskManager");
return false;
}
partitionNameToReplicaId.put(replica.getPartitionId().toPathString(), replica);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, we should handle it in prefilecopy. Please make a note.

Copy link
Contributor

@aga9900 aga9900 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added couple of comments.

@piyujai piyujai force-pushed the bootstrap_state_build branch 2 times, most recently from 6d69c20 to 4098a89 Compare December 23, 2024 07:36
@piyujai piyujai force-pushed the bootstrap_state_build branch from 4098a89 to f32df47 Compare December 23, 2024 07:40
@aga9900 aga9900 self-requested a review December 23, 2024 07:52
@piyujai piyujai dismissed aga9900’s stale review December 23, 2024 11:08

changes are done.

Copy link
Contributor

@DevenAhluwalia DevenAhluwalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@piyujai piyujai merged commit 81a7b6c into master Dec 23, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants