Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pin_memory to DataLoader and update ImageInfo to support #1894

Draft
wants to merge 3 commits into
base: sd3
Choose a base branch
from

Conversation

rockerBOO
Copy link
Contributor

Support using pin_memory with DataLoader. Updated ImageInfo to pin_memory for relevant tensors. Will probably need some testing but is disabled by default.

Host to GPU copies are much faster when they originate from pinned (page-locked) memory. See Use pinned memory buffers for more details on when and how to use pinned memory generally.

For data loading, passing pin_memory=True to a DataLoader will automatically put the fetched data Tensors in pinned memory, and thus enables faster data transfer to CUDA-enabled GPUs.

https://pytorch.org/docs/stable/data.html#memory-pinning
https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

@kohya-ss
Copy link
Owner

Thank you for this! I will check it as soon as possible.

I checked pin_memory before, and it caused a large increase in memory usage in a Windows environment. If you have already tried it in a Windows environment, did it work without any problems?

@rockerBOO
Copy link
Contributor Author

I don't have access to a Window environment to test. On Linux it doesn't seem to effect memory usage (or I'm not using it correctly). This would be off by default so if it does hamper Windows memory usage, we can add a note in the documentation.

It has been roughly 8-10% improvement to epoch speed but I haven't done enough testing. Larger batch sizes may have higher gains, running with 1 or 3 batch size on a 2080 seemed to be the same relative improvement.

I'm looking at another performance pass to try and find bottlenecks with epoch speed and GPU usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants