Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automated scaling stress test with upload of metrics to CW #5912

Closed
wants to merge 9 commits into from

Conversation

EddyMM
Copy link
Contributor

@EddyMM EddyMM commented Dec 4, 2023

Description of changes

This test scales a cluster up and down while periodically monitoring some primary metrics.
The metrics monitored are:

  • Number of EC2 instances launched
  • Number of successfully bootstrapped compute nodes that have joined the cluster
  • Number of jobs pending or in configuration
  • Number of jobs currently running

The above metrics are uploaded to CloudWatch.
The output of this test are:

  • Log messages with the Scale up and Scale down time in seconds
  • Log with the Metrics Source that can be used from CloudWatch Console
  • A Metrics Image showing the scale up and scale down using a linear graph with annotations

(A variety of cluster sizes will be added in subsequent PRs)

Tests

  • Ran test and verified the test artifacts

References

  • Link to impacted open issues.
  • Link to related PRs in other packages (i.e. cookbook, node).
  • Link to documentation useful to understand the changes.

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@EddyMM EddyMM added the skip-changelog-update Disables the check that enforces changelog updates in PRs label Dec 4, 2023
Copy link

codecov bot commented Dec 4, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (651e779) 90.14% compared to head (d6db5fb) 90.14%.
Report is 5 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #5912   +/-   ##
========================================
  Coverage    90.14%   90.14%           
========================================
  Files          180      180           
  Lines        15735    15735           
========================================
  Hits         14185    14185           
  Misses        1550     1550           
Flag Coverage Δ
unittests 90.14% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@EddyMM EddyMM force-pushed the wip/automated-scaling-test branch 2 times, most recently from d0bc012 to f399ce7 Compare December 5, 2023 15:25
@EddyMM EddyMM force-pushed the wip/automated-scaling-test branch from f399ce7 to 66a83b5 Compare December 5, 2023 16:15
@EddyMM EddyMM marked this pull request as ready for review December 5, 2023 16:32
@EddyMM EddyMM requested review from a team as code owners December 5, 2023 16:32
clustermgtd_conf_path = _retrieve_clustermgtd_conf_path(remote_command_executor)
_set_protected_failure_count(remote_command_executor, 2, clustermgtd_conf_path)
clustermgtd_conf_path = retrieve_clustermgtd_conf_path(remote_command_executor)
set_protected_failure_count(remote_command_executor, 2, clustermgtd_conf_path)

Check failure

Code scanning / CodeQL

Wrong number of arguments in a call Error test

Call to
function set_protected_failure_count
with too many arguments; should be no more than 2.
@@ -506,7 +508,7 @@
# Re-enable protected mode
_enable_protected_mode(remote_command_executor, clustermgtd_conf_path)
# Decrease protected failure count for quicker enter protected mode.
_set_protected_failure_count(remote_command_executor, 2, clustermgtd_conf_path)
set_protected_failure_count(remote_command_executor, 2, clustermgtd_conf_path)

Check failure

Code scanning / CodeQL

Wrong number of arguments in a call Error test

Call to
function set_protected_failure_count
with too many arguments; should be no more than 2.
@@ -1947,7 +1941,7 @@
):
"""Test Bootstrap failures have no affect on cluster when protected mode is disabled."""
# Disable protected_mode by setting protected_failure_count to -1
_set_protected_failure_count(remote_command_executor, -1, clustermgtd_conf_path)
set_protected_failure_count(remote_command_executor, -1, clustermgtd_conf_path)

Check failure

Code scanning / CodeQL

Wrong number of arguments in a call Error test

Call to
function set_protected_failure_count
with too many arguments; should be no more than 2.
@lukeseawalker
Copy link
Contributor

superseded by #6027

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-changelog-update Disables the check that enforces changelog updates in PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants