Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[integ-test] rotate instance types for some integration tests #6645

Merged
merged 2 commits into from
Jan 23, 2025

Conversation

hanwen-cluster
Copy link
Contributor

  1. Unlike rotating OS, instance types are region dependent. Therefore, we cannot use general Jinja variables like {{ OS_X86_1 }}. We need to use region specific Jinja variables like {{ US_EAST_1_INSTANCE_TYPE_0 }}
  2. For code efficiency, this commit only populates three large AWS regions. The code is extendable if more regions should be added.
  3. This commit rotates instance types only on test_essential_features and test_cluster_with_gpu_health_checks. The code is extendable if more tests should be added.
  4. Improve test_cluster_with_gpu_health_checks to be able to run on both x86 and arm

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

1. Unlike rotating OS, instance types are region dependent. Therefore, we cannot use general Jinja variables like `{{ OS_X86_1 }}`. We need to use region specific Jinja variables like `{{ US_EAST_1_INSTANCE_TYPE_0 }}`
2. For code efficiency, this commit only populates three large AWS regions. The code is extendable if more regions should be added.
3. This commit rotates instance types only on `test_essential_features` and `test_cluster_with_gpu_health_checks`. The code is extendable if more tests should be added.
4. Improve `test_cluster_with_gpu_health_checks` to be able to run on both x86 and arm

Signed-off-by: Hanwen <hanwenli@amazon.com>
@hanwen-cluster hanwen-cluster requested review from a team as code owners January 23, 2025 17:57
@hanwen-cluster hanwen-cluster added the skip-changelog-update Disables the check that enforces changelog updates in PRs label Jan 23, 2025
@hanwen-cluster hanwen-cluster changed the title [integ-test] rotate instance type for some integration tests [integ-test] rotate instance types for some integration tests Jan 23, 2025
)
for index in range(len(gpu_instances)):
instance_type = gpu_instances[(today_number + index) % len(gpu_instances)]
result[f"{region_jinja}_GPU_INSTANCE_TYPE_{index}"] = instance_type[: -len(".xlarge")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing ".xlarge" if we just add it back in the develop.yaml. Especially since we are only looking at xlarge instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the approach more extensible.
For example test_dcv_configuration uses .2xlarge. If we lock down to .xlarge, this mechanism cannot be used by test_dcv_configuration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but right now we are already locked down to .xlarge due to this line if instance_type["InstanceType"].endswith(".xlarge"). If you really wanted to make it extendable, then shouldn't _get_instance_type_parameters() have a parameter that takes in a list of instance sizes and have the output be a map where the key is the instance size and the value is the list of instance types.

@hanwen-cluster hanwen-cluster enabled auto-merge (rebase) January 23, 2025 19:10
@hanwen-cluster hanwen-cluster merged commit e9dc0f0 into aws:develop Jan 23, 2025
24 checks passed
hanwen-cluster added a commit to hanwen-cluster/aws-parallelcluster that referenced this pull request Jan 24, 2025
This bug was introduced by aws#6645

Signed-off-by: Hanwen <hanwenli@amazon.com>
hanwen-cluster added a commit that referenced this pull request Jan 24, 2025
This bug was introduced by #6645

Signed-off-by: Hanwen <hanwenli@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-changelog-update Disables the check that enforces changelog updates in PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants