Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xnn_weights_cache_provider look_up doesn't work? #6257

Open
mcr229 opened this issue Apr 4, 2024 · 4 comments
Open

xnn_weights_cache_provider look_up doesn't work? #6257

mcr229 opened this issue Apr 4, 2024 · 4 comments

Comments

@mcr229
Copy link
Contributor

mcr229 commented Apr 4, 2024

I've recently been experimenting with XNNPACK's weight cache to reduce load time by caching packed weights and also reduce memory pressure for repeated weights across the same kernels.

I was experiementing with fully-connected operator and found that the weight cache was never being hit. I noticed that when using the apis to create the xnn_weights_cache_t we set the look up function to be xnn_internal_weights_cache_look_up:

cache_provider->look_up = (size_t(*)(void*, const struct xnn_weights_cache_look_up_key*))xnn_internal_weights_cache_look_up;

looking at this function, it looks like a placeholder function which would always return XNN_CACHE_NOT_FOUND:

XNNPACK/src/cache.c

Lines 491 to 496 in 85071b8

size_t xnn_internal_weights_cache_look_up(
struct xnn_internal_weights_cache* cache, const struct xnn_weights_cache_look_up_key* cache_key)
{
// The default implementation does not support this query.
return XNN_CACHE_NOT_FOUND;
}

Now when I'm using the weights cache to create a runtime_t with only a fully connected operator, in the flow of creating the fully-connected operator, we look up the cache to see if the weights have been packed before, using xnn_weights_cache_look_up:

if (use_weights_cache(fully_connected_op)) {
cache_offset = xnn_weights_cache_look_up(
fully_connected_op->weights_cache, &cache_key);
}

However this just uses the the placeholder function above, returning XNN_CACHE_NOT_FOUND:

XNNPACK/src/cache.c

Lines 530 to 534 in 85071b8

size_t xnn_weights_cache_look_up(
xnn_weights_cache_t cache, const struct xnn_weights_cache_look_up_key* cache_key)
{
return cache->look_up(cache->context, cache_key);
}

As a result, every look up would then fall to XNN_CACHE_NOT_FOUND, in which weights have to be repacked, and memory has to be allocated for the newly packed weights:

if (cache_offset == XNN_CACHE_NOT_FOUND) {
void* weights_ptr = xnn_get_pointer_to_write_weights(
fully_connected_op, aligned_total_weights_size, packed_weights_padding_byte);
if (weights_ptr == NULL) {
xnn_log_error(
"failed to allocate %zu bytes for %s operator packed weights",
packed_weights_size, xnn_operator_type_to_string(operator_type));
goto error;
}
xnn_log_debug("allocated %zu bytes for packed weights in %s operator",
aligned_total_weights_size, xnn_operator_type_to_string(operator_type));
if (flags & XNN_FLAG_TRANSPOSE_WEIGHTS) {
pack_gemm_gio_w(
/*groups=*/1, output_channels, input_channels,
nr, kr, sr,
output_channels,
kernel, bias, /*scale=*/NULL,
weights_ptr,
gemm_config->nr * extra_weights_bytes,
packing_params);

Am I looking at this incorrectly? Or is this a feature that is still a wip? Or is this a bug that is meant to be fixed in the future?

@digantdesai
Copy link
Contributor

@mcr229 - Is it possible to not use the default constructor i.e. xnn_create_weights_cache_with_size and write a custom constructor which populates struct xnn_weights_cache_provider with your methods?

@mcr229
Copy link
Contributor Author

mcr229 commented Apr 8, 2024

@digantdesai hmm I'm wondering if this suggests that XNNPACK doesn't offer a default operator for looking up weights then?

@alankelly
Copy link
Collaborator

Hi, is this still an issue? Do you want to discuss during the next meeting?

@chenghuaWang
Copy link

chenghuaWang commented Oct 29, 2024

@mcr229 Hi, I also encountered the same problem. Have you solved it? My pseudocode logic is as follows:

xnn_create_weights_cache(&weight_cache_);
// define Bias
// define Weight
// define input and output
// define fully_connected op
// create runtime with weights_cache
// finalize weights_cache
// reshape, setup runtime and invoke

// define Bias
// define Weight
// define input and output
// define fully_connected op
// create runtime with weights_cache
// reshape, setup runtime and invoke

the second fully_connected op still alloc memory for bias and weight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants