Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Clone on Api Types and Arc Dispatcher #6665

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

cwfitzgerald
Copy link
Member

@cwfitzgerald cwfitzgerald commented Dec 5, 2024

This implements clone on all api types and on the arc dispatcher.

Because the new context api moves the drop logic into the backends, this is totally fine, and to wgpu_core it acts like someone is using the standard Arc<wgpu::Buffer>.

Still needs some tests and stuff, but this would be a huge usability improvement, and will help in the development of better surface apis.

Checklist

  • Run cargo fmt.
  • [] Run taplo format.
  • Run cargo clippy. If applicable, add:
    • --target wasm32-unknown-unknown
    • --target wasm32-unknown-emscripten
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

@cwfitzgerald cwfitzgerald force-pushed the cw/copy-types branch 2 times, most recently from f01ebb7 to eb3ebd0 Compare December 5, 2024 06:51
@cwfitzgerald
Copy link
Member Author

I picked basically the worst way to implement this, will do it better tomorrow

@sagudev
Copy link
Contributor

sagudev commented Dec 5, 2024

This will complicate #6658 as it uses Arc::get_mut 😔.

@cwfitzgerald
Copy link
Member Author

@sagudev the things you use get_mut for aren't clonable - only types with only & references use it.

Probably those shouldn't be Arc though, but Box?

@sagudev
Copy link
Contributor

sagudev commented Dec 6, 2024

Probably those shouldn't be Arc though, but Box?

Then we have the problem of impl PartialEq, Hash and others, I think we need trait for DynCompare and DynHashable or some other black magic, this will not be as easy as I initially thought.

@sagudev
Copy link
Contributor

sagudev commented Dec 6, 2024

Actually if this PR wraps all internal dispatch types into Arcs and we use this arc for clone/eq/hash then there is no need for dispatch types to impl eq/hash.

@cwfitzgerald
Copy link
Member Author

Yeah I was thinking that - the one problem is the non & types which aren't clone. Those could just use normal ID generation. Can tackle that in a followup.

@teoxoy
Copy link
Member

teoxoy commented Dec 6, 2024

Don't the websys types impl Clone / can't we put the burden on the core/web types to impl Clone?

With the approach in this PR, once we remove the registries we will end up with 2 Arcs instead of 1.

@cwfitzgerald
Copy link
Member Author

cwfitzgerald commented Dec 6, 2024

I'm theory we could, but Buffers/Textures contain a decent amount of state that just hangs out in the frontend and needs to be shared as well.

This isn't a regression from v23, as they had everything boxed already.

More generally, to me the interesting part isn't the internals, which we can iterate on , but that we're now advertising Clone support and will continue to support that.

I do believe that cloning a web_sys value requires calling out to JS, so I was trying to avoid it.

@cwfitzgerald
Copy link
Member Author

cwfitzgerald commented Dec 11, 2024

Resolution: Merge and Iterate

  • CF: pro arguments:
    • Webgpu is clone
    • It’s surprising behavior
    • map_async pretty much requires it
  • Teo: middle ware today often have its users pass things wrapped in Arc today
  • JB: this is similar to what users get on Metal & DX12
  • JB: having the only reference of a thing is very common in Rust and prevents errors (in particular concurrent mutable access). It’s part of a contract, gives control over aliasing. This change means you’ll never get this.
  • Teo: Non alias access isn’t the case anyways because of drivers etc.
  • Nical: wgpu types are more like handles
  • JB: WebGPU api has multiple owners all over the place. Ship has sailed!
  • Nical: Very sad to have more indirections again
  • CF: This is still possible. This is more about whether we’re willing to support clone
    Some exclusive things are still exclusive (e.g. command encoder, bundles etc.)
    There’s many ways we could iterate

@Wumpf
Copy link
Member

Wumpf commented Jan 2, 2025

@cwfitzgerald any reason not to undraft and land?

@cwfitzgerald
Copy link
Member Author

No, just need to do it.

@cwfitzgerald cwfitzgerald marked this pull request as ready for review January 3, 2025 00:26
@cwfitzgerald cwfitzgerald requested a review from a team as a code owner January 3, 2025 00:26
@cwfitzgerald
Copy link
Member Author

Alright, this implementation of clone makes things slightly slower, but I think some impl changes will cause that to go away.

Benchmark results. This is inverted, so this is the speed of trunk compared to this PR.

Gnuplot not found, using plotters backend
AdapterInfo { name: "NVIDIA GeForce RTX 4070", vendor: 4318, device: 10118, device_type: DiscreteGpu, driver: "NVIDIA", driver_info: "561.09", backend: Vulkan }
Benchmarking Renderpass: Single Threaded/1 renderpasses x 10000 draws (Renderpass Time): Warming up for 3.0000 sAdapterInfo { name: "NVIDIA GeForce RTX 4070", vendor: 4318, device: 10118, device_type: DiscreteGpu, driver: "NVIDIA", driver_info: "561.09", backend: Vulkan }
Renderpass: Single Threaded/1 renderpasses x 10000 draws (Renderpass Time)
                        time:   [45.450 ms 45.682 ms 45.941 ms]
                        thrpt:  [217.67 Kelem/s 218.90 Kelem/s 220.02 Kelem/s]
                 change:
                        time:   [-5.4156% -4.5534% -3.7969%] (p = 0.00 < 0.05)
                        thrpt:  [+3.9468% +4.7707% +5.7257%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
Renderpass: Single Threaded/2 renderpasses x 5000 draws (Renderpass Time)
                        time:   [45.782 ms 45.924 ms 46.078 ms]
                        thrpt:  [217.02 Kelem/s 217.75 Kelem/s 218.43 Kelem/s]
                 change:
                        time:   [-10.512% -8.2788% -6.4509%] (p = 0.00 < 0.05)
                        thrpt:  [+6.8957% +9.0260% +11.746%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Renderpass: Single Threaded/4 renderpasses x 2500 draws (Renderpass Time)
                        time:   [46.066 ms 46.239 ms 46.436 ms]
                        thrpt:  [215.35 Kelem/s 216.27 Kelem/s 217.08 Kelem/s]
                 change:
                        time:   [-6.1473% -5.4472% -4.7625%] (p = 0.00 < 0.05)
                        thrpt:  [+5.0007% +5.7611% +6.5499%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
Renderpass: Single Threaded/8 renderpasses x 1250 draws (Renderpass Time)
                        time:   [47.608 ms 47.751 ms 47.915 ms]
                        thrpt:  [208.70 Kelem/s 209.42 Kelem/s 210.05 Kelem/s]
                 change:
                        time:   [-5.4332% -4.8655% -4.3117%] (p = 0.00 < 0.05)
                        thrpt:  [+4.5060% +5.1143% +5.7453%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
Renderpass: Single Threaded/1 renderpasses x 10000 draws (Submit Time)
                        time:   [2.4825 ms 2.5108 ms 2.5470 ms]
                        thrpt:  [3.9262 Melem/s 3.9829 Melem/s 4.0282 Melem/s]
                 change:
                        time:   [-1.0206% +0.3690% +1.9548%] (p = 0.64 > 0.05)
                        thrpt:  [-1.9173% -0.3676% +1.0311%]
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
Renderpass: Single Threaded/2 renderpasses x 5000 draws (Submit Time)
                        time:   [3.3812 ms 3.3983 ms 3.4169 ms]
                        thrpt:  [2.9266 Melem/s 2.9427 Melem/s 2.9575 Melem/s]
                 change:
                        time:   [-1.1034% -0.3084% +0.4571%] (p = 0.46 > 0.05)
                        thrpt:  [-0.4551% +0.3093% +1.1157%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe
Renderpass: Single Threaded/4 renderpasses x 2500 draws (Submit Time)
                        time:   [4.1992 ms 4.2224 ms 4.2480 ms]
                        thrpt:  [2.3541 Melem/s 2.3683 Melem/s 2.3814 Melem/s]
                 change:
                        time:   [-1.4977% -0.2787% +0.7641%] (p = 0.65 > 0.05)
                        thrpt:  [-0.7583% +0.2795% +1.5205%]
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
Renderpass: Single Threaded/8 renderpasses x 1250 draws (Submit Time)
                        time:   [5.4617 ms 5.4875 ms 5.5156 ms]
                        thrpt:  [1.8130 Melem/s 1.8223 Melem/s 1.8309 Melem/s]
                 change:
                        time:   [-2.3226% -1.2115% -0.2481%] (p = 0.03 < 0.05)
                        thrpt:  [+0.2488% +1.2264% +2.3778%]
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) high mild
  1 (1.00%) high severe

Renderpass: Multi Threaded/2 threads x 5000 draws
                        time:   [26.178 ms 26.477 ms 26.832 ms]
                        thrpt:  [372.69 Kelem/s 377.68 Kelem/s 382.00 Kelem/s]
                 change:
                        time:   [-17.042% -15.902% -14.724%] (p = 0.00 < 0.05)
                        thrpt:  [+17.266% +18.909% +20.543%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
Renderpass: Multi Threaded/4 threads x 2500 draws
                        time:   [16.231 ms 16.273 ms 16.316 ms]
                        thrpt:  [612.88 Kelem/s 614.53 Kelem/s 616.09 Kelem/s]
                 change:
                        time:   [-12.101% -11.760% -11.425%] (p = 0.00 < 0.05)
                        thrpt:  [+12.898% +13.328% +13.767%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Renderpass: Multi Threaded/8 threads x 1250 draws
                        time:   [13.091 ms 13.126 ms 13.163 ms]
                        thrpt:  [759.70 Kelem/s 761.82 Kelem/s 763.89 Kelem/s]
                 change:
                        time:   [+0.1150% +0.5247% +0.9152%] (p = 0.01 < 0.05)
                        thrpt:  [-0.9069% -0.5219% -0.1149%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Renderpass: Bindless/10000 draws
                        time:   [16.447 ms 16.572 ms 16.721 ms]
                        thrpt:  [598.05 Kelem/s 603.42 Kelem/s 608.00 Kelem/s]
                 change:
                        time:   [-1.1342% -0.2367% +0.7923%] (p = 0.65 > 0.05)
                        thrpt:  [-0.7861% +0.2373% +1.1473%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Renderpass: Empty Submit with 90000 Resources
                        time:   [104.72 µs 104.91 µs 105.12 µs]
                        change: [+0.7172% +1.0640% +1.3995%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

@cwfitzgerald cwfitzgerald merged commit 03ff99e into gfx-rs:trunk Jan 3, 2025
27 checks passed
@cwfitzgerald cwfitzgerald deleted the cw/copy-types branch January 3, 2025 00:30
@a1phyr a1phyr mentioned this pull request Jan 17, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants