Optimize Latent Caching Speed with VAE Optimizations #1910

alefh123 · 2025-01-29T14:58:24Z

PR Summary:

With some help from DeepSeek, this PR aims to improve latent caching speed.

Key Changes:

Mixed Precision Caching: VAE encoding now uses FP16 (or BF16) during latent caching for faster computation and reduced memory use.

Channels-Last VAE: VAE is temporarily switched to channels_last memory format during caching to improve GPU performance.

--vae_batch_size Utilization: This leverages the existing --vae_batch_size option; users should increase it for further speedups.

Benefits:

Significantly Faster Latent Caching: Reduces preprocessing time.

Improved GPU Efficiency: Optimizes VAE encoding on GPUs.

Impact: Faster training setup due to quicker latent caching.

This is much more concise and directly highlights the essential changes and their impact. Let me know if you would like it even shorter or with any other adjustments!

Based on the optimizations implemented—mixed precision and channels-last format for the VAE during caching—a speedup of 2x to 4x is a reasonable estimate.

PR Summary: This PR accelerates latent caching, a slow preprocessing step, by optimizing the VAE's encoding process. Key Changes: Mixed Precision Caching: VAE encoding now uses FP16 (or BF16) during latent caching for faster computation and reduced memory use. Channels-Last VAE: VAE is temporarily switched to channels_last memory format during caching to improve GPU performance. --vae_batch_size Utilization: This leverages the existing --vae_batch_size option; users should increase it for further speedups. Benefits: Significantly Faster Latent Caching: Reduces preprocessing time. Improved GPU Efficiency: Optimizes VAE encoding on GPUs. Impact: Faster training setup due to quicker latent caching. This is much more concise and directly highlights the essential changes and their impact. Let me know if you would like it even shorter or with any other adjustments! Based on the optimizations implemented—mixed precision and channels-last format for the VAE during caching—a speedup of 2x to 4x is a reasonable estimate.

rockerBOO · 2025-01-29T17:47:40Z

I don't think this would be where it is doing the caching though. Converting SDXL VAE to fp16 also can cause NaN issues so we probably wouldn't want to convert it to fp16 regardless.

The memory format might be what we can try from this as it might improve the performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Latent Caching Speed with VAE Optimizations #1910

Optimize Latent Caching Speed with VAE Optimizations #1910

alefh123 commented Jan 29, 2025

rockerBOO commented Jan 29, 2025

Optimize Latent Caching Speed with VAE Optimizations #1910

Are you sure you want to change the base?

Optimize Latent Caching Speed with VAE Optimizations #1910

Conversation

alefh123 commented Jan 29, 2025

rockerBOO commented Jan 29, 2025