Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-7487 join memory fix #276

Merged
merged 4 commits into from
Dec 7, 2023
Merged

Conversation

talebzeghmi
Copy link
Collaborator

@talebzeghmi talebzeghmi commented Dec 7, 2023

  • reduce s3op NUM_WORKERS_DEFAULT on a join (loading data from parent steps/splits) from 64 to 5
  • note: This only increased the time in order of seconds, albeit this is with lightweight small s3 self data.
  • This can be overridden in a join step with the following env decorator
@environment(  # pylint: disable=E1102
  vars={"METAFLOW_S3OP_NUM_WORKERS_DEFAULT": "3"}
)
@step

Test run

import time
from metaflow import FlowSpec, step, resources, exit_handler, Parameter, environment


class HelloFlow(FlowSpec):
    splits = Parameter("splits", default=10)

    @step
    def start(self):
        print("HelloFlow is starting.")
        self.x = "foo"
        self.y = "foo"
        self.z = "foo"
        self.a = "b"
        self.my_list = list(range(self.splits))
        self.next(self.compute, foreach="my_list")

    @step
    def compute(self):
        self.c = self.input
        time.sleep(2)
        self.next(self.join)

    @environment(  # pylint: disable=E1102
        vars={"METAFLOW_S3OP_NUM_WORKERS": "10"}
    )
    @step
    def join(self, inputs):
        self.merge_artifacts(inputs, exclude=["c"])
        print("done")
        self.next(self.end)

    @step
    def end(self):
        print("HelloFlow is all done.")


if __name__ == "__main__":
    HelloFlow()

Run 1 w/ 100 splits METAFLOW_S3OP_NUM_WORKERS=10

Run 2 w/ 100 splits METAFLOW_S3OP_NUM_WORKERS=5

With

- reduce s3op NUM_WORKERS_DEFAULT on a join (on parent splits) from 64 to 10
-  This can be overriden on a join step with the following env decorator
@Environment(  # pylint: disable=E1102
  vars={"METAFLOW_S3OP_NUM_WORKERS_DEFAULT": "10"}
)
metaflow/plugins/aip/aip_metaflow_step.py Outdated Show resolved Hide resolved
@talebzeghmi talebzeghmi merged commit 7062fad into feature/aip Dec 7, 2023
3 checks passed
@talebzeghmi talebzeghmi deleted the tz/AIP-7487-join-memory branch December 7, 2023 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants