Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Latent Caching Speed with VAE Optimizations #1910

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alefh123
Copy link

PR Summary:

With some help from DeepSeek, this PR aims to improve latent caching speed.

Key Changes:

Mixed Precision Caching: VAE encoding now uses FP16 (or BF16) during latent caching for faster computation and reduced memory use.

Channels-Last VAE: VAE is temporarily switched to channels_last memory format during caching to improve GPU performance.

--vae_batch_size Utilization: This leverages the existing --vae_batch_size option; users should increase it for further speedups.

Benefits:

Significantly Faster Latent Caching: Reduces preprocessing time.

Improved GPU Efficiency: Optimizes VAE encoding on GPUs.

Impact: Faster training setup due to quicker latent caching.

This is much more concise and directly highlights the essential changes and their impact. Let me know if you would like it even shorter or with any other adjustments!

Based on the optimizations implemented—mixed precision and channels-last format for the VAE during caching—a speedup of 2x to 4x is a reasonable estimate.

PR Summary:

This PR accelerates latent caching, a slow preprocessing step, by optimizing the VAE's encoding process.

Key Changes:

Mixed Precision Caching: VAE encoding now uses FP16 (or BF16) during latent caching for faster computation and reduced memory use.

Channels-Last VAE: VAE is temporarily switched to channels_last memory format during caching to improve GPU performance.

--vae_batch_size Utilization: This leverages the existing --vae_batch_size option; users should increase it for further speedups.

Benefits:

Significantly Faster Latent Caching: Reduces preprocessing time.

Improved GPU Efficiency: Optimizes VAE encoding on GPUs.

Impact: Faster training setup due to quicker latent caching.

This is much more concise and directly highlights the essential changes and their impact. Let me know if you would like it even shorter or with any other adjustments!

Based on the optimizations implemented—mixed precision and channels-last format for the VAE during caching—a speedup of 2x to 4x is a reasonable estimate.
@rockerBOO
Copy link
Contributor

I don't think this would be where it is doing the caching though. Converting SDXL VAE to fp16 also can cause NaN issues so we probably wouldn't want to convert it to fp16 regardless.

The memory format might be what we can try from this as it might improve the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants