Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags not working as expected #5369

Closed
nobodyme opened this issue Jun 2, 2023 · 5 comments
Closed

Tags not working as expected #5369

nobodyme opened this issue Jun 2, 2023 · 5 comments
Labels

Comments

@nobodyme
Copy link

nobodyme commented Jun 2, 2023

Please make sure to add the following data in order to facilitate the root cause detection.

Required Info:

  • AWS ParallelCluster version: 3.6.0

Bug description and how to reproduce:
I added the tags as shown here, https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html,
First at the queue level, when it did not work, also tried to set it at the compute level,

Added the Name tag to be specific, but the instances created didn't have the Name tag I added.
What could potentially be wrong here?

Here's the pcluster config, to be precise, (even the Test tag, didn't show up on the launched instances)

Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: default-queue
      ComputeSettings:
        LocalStorage:
          RootVolume:
            Size: 70
            VolumeType: gp3
      CustomActions:
        OnNodeConfigured:
          Script: <script>
      Iam:
        AdditionalIamPolicies:
          - Policy: <policy-1>
      Networking:
        SubnetIds:
          - subnet-1
        AdditionalSecurityGroups:
          - sg-1
          - sg-2
      ComputeResources:
        - Name: t3medium
          MaxCount: 20
          InstanceType: t3.medium
        - Name: t3large
          MaxCount: 20
          InstanceType: t3.large
        - Name: t3xlarge
          MaxCount: 10
          InstanceType: t3.xlarge
        - Name: t32xlarge
          MaxCount: 10
          InstanceType: t3.2xlarge
      Tags:
        - Key: Name
          Value: default-queue
        - Key: Test
          Value: default-queue
    - Name: ondemand-queue
      ComputeSettings:
        LocalStorage:
          RootVolume:
            Size: 80
            VolumeType: gp3
      CustomActions:
        OnNodeConfigured:
          Script: <script>
      Iam:
        AdditionalIamPolicies:
          - Policy: <policy-1>
      Networking:
        SubnetIds:
          - subnet-1
        AdditionalSecurityGroups:
          - sg-1
          - sg-2
      ComputeResources:
        - Name: t3medium
          MaxCount: 20
          InstanceType: t3.medium
          Tags:
            - Key: Name
              Value: ondemand-queue
        - Name: t3large
          MaxCount: 20
          InstanceType: t3.large
          Tags:
            - Key: Name
              Value: ondemand-queue
@nobodyme nobodyme added the 3.x label Jun 2, 2023
@hanwen-pcluste
Copy link
Contributor

Hi Naveen,

The Name tag key is reserved by pcluster. All compute nodes have tag "Name: Compute". If you use another tag key, the tag will be attached properly.

Thank you,
Hanwen

@nobodyme
Copy link
Author

nobodyme commented Jun 5, 2023

If you see, there's a tag called "Test" that I attached as well, in the same config, even that failed to show up.
Is something wrong there, in the way I have applied?

@hanwen-pcluste
Copy link
Contributor

It is correct. I just did a manual test and the tags were correctly attached to compute nodes. When you view tags on EC2 console, there might be multiple pages. Probably the tags you were looking for were on the second page.

FYI: There is a known issue that tags on spot instances are not attached. But you should be unaffected because you are using on-demand instances.

@nobodyme
Copy link
Author

That might be the case, I haven't had the chance to check, but I'm good to close it out.

@bwakefie
Copy link

bwakefie commented Jan 24, 2025

@hanwen-pcluste I believe I might be seeing this same issue. I set:

ComputeResources:
  Tags:
    - Key: Name
      Value: bwakefield-hpc-Compute

which should override other assignments of tags with conflicting names: https://docs.aws.amazon.com/parallelcluster/latest/ug/Scheduling-v3.html#yaml-Scheduling-SlurmQueues-ComputeResources-Tags (that is, tags with Key: Name assigned either in the SlurmQueue or in the cluster config).

If I check the Launch Template as defined in the final CF stack that is deployed by pcluster, I see two tags with Key: Name:

Image

The deployed compute nodes have Name: Compute when I would expect them to have Name: bwakefield-hpc-Compute.

I'm using pcluster version 3.12.0, let me know if this should be a new issue.
Here is an example config:

---
Region: us-west-2
Image:
  Os: alinux2

HeadNode:
  InstanceType: c5n.2xlarge
  Networking:
    SubnetId: subnet-xxxxxxxxxxx
  Ssh:
    KeyName: bwakefield-pcluster-key
  Dcv:
    Enabled: True
    Port: 8443
  Iam:
    AdditionalIamPolicies:
    # permit ec2 spinup of compute nodes
    - Policy: arn:aws:iam::aws:policy/AmazonEC2FullAccess
    # Insert s3 policy here...
    - Policy: arn:aws:iam::aws:policy/AmazonS3FullAccess
    # Insert SSM Policy here
    - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
  CustomActions:
    OnNodeConfigured:
      #Script URL below. This is run after all bootstrap scripts are run
      Script: https://raw.githubusercontent.com/spack/spack-configs/main/AWS/parallelcluster/postinstall.sh
      # Arguments to be passed to the script:
      #Args:
      #  - /shared/spack

Scheduling:
  Scheduler: slurm
  SlurmSettings:
    ScaledownIdletime: 10
    Dns:
      DisableManagedDns: true
      UseEc2Hostnames: true
  SlurmQueues:
  - Name: hpc
    AllocationStrategy: lowest-price
    ComputeResources:
      - Name: hpc-cr-0
        Instances:
          - InstanceType: c6i.32xlarge
        MinCount: 0
        MaxCount: 4
        Efa:
          Enabled: true
          GdrSupport: true
# Tag I expect to propagate to EC2 nodes in the `hpc` queue ##########
        Tags:
          - Key: Name
            Value: bwakefield-hpc-Compute
#############################################################
    ComputeSettings:
      LocalStorage:
        RootVolume:
          Size: 100
          VolumeType: gp3
    Networking:
      SubnetIds:
        - subnet-0ae1d6a9015f245ad
      PlacementGroup:
        Enabled: true
CustomS3Bucket: hpctrain-pcluster-staging-bucket

# Storage for shared and scratch
SharedStorage:
  - MountDir: /shared
    Name: trnebs
    StorageType: Ebs
    EbsSettings:
      VolumeType: gp3
      Size: 200
  - MountDir: /fsxscratch
    Name: trnscratch
    StorageType: FsxLustre
    FsxLustreSettings:
      StorageCapacity: 1200
      DeploymentType: SCRATCH_1
# Tags for the cluster
Tags:
  - Key: Name
    Value: bwakefield-HeadNode

I do note that the tag at the end does result in the Name tag having Value: bwakefield-HeadNode on the head node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants