Replies: 6 comments
-
@seanlaw I'm sorry! When i submitted the question i got a 404 error from Github, so i thought the question was not created, so i submitted again. My bad! |
Beta Was this translation helpful? Give feedback.
-
That's great to hear, @Jacks349! We also welcome any constructive criticism as well. We are here to learn how to best serve our community.
This is a great question! There is a very subtle explanation for this that may seem mysterious at first but it can all be explained by the fact that STUMPY uses the wonderful Numba Python package for fast numerical computation. You'll notice that STUMPY has only three core dependencies (NumPy, SciPy, and Numba). Essentially, Numba takes Python code and performs a "just-in-time" (JIT) compilation of that code into performant and parallelized machine code. However, the "JIT" part means that a function is only compiled when that function is called by the user (i.e., if you don't call the other STUMPY functions then those won't get compiled). This is quite different from traditional compiled languages like C, C++, or Java where the programmer has to manually compile the code first before executing. With Numba, at first execution, the code is automatically compiled at run-time before it is executed but it doesn't need to re-compile there after. As you can imagine, this compilation time (when first called) takes a little bit of time/overhead but, after the first call, it is purely compute time. So, what you are seeing in the 1.68 seconds is both compile time AND compute time. After that, since your data is so small and it no longer has to compile the function, the compute time is basically instantaneous (likely in the sub milliseconds). Thus, when performing any timing comparisons with STUMPY, what you'll want to do is just run the code once on a super small toy data set just so it forces compilation and then, after that, perform your timing calculation. Or, in other words, ignore the first run. Let me know if that makes sense or if I can clarify my words in any way. |
Beta Was this translation helpful? Give feedback.
-
@seanlaw |
Beta Was this translation helpful? Give feedback.
-
If i understood correctly: i have a script constantly running that receives data from a third party, and every time the data is received i want to use |
Beta Was this translation helpful? Give feedback.
-
It may feel like caching but there is a difference in that caching (in my mind) applies to the caching of data in memory. So, if you execute the same function with the same data then it can be faster. In our case, if you change the data (so there is no data caching) but keep the size of the data the same then the execution time should be nearly identical. So, we are "caching" compilation (I use this term loosely as I'm sure it's not correct), not data. Though, there may be some level of data caching that is happening that I'm not aware of... |
Beta Was this translation helpful? Give feedback.
-
Thank you for taking your time to make me understand this topic, it's perfectly clear now! |
Beta Was this translation helpful? Give feedback.
-
I've been using Stumpy for a while now to detect patterns on a target dataset using a pattern specified by me, and i'm getting very good results! Step by step i'm understanding more how to find what i'm looking for, even when i'm using very small datasets, so kudos to Stumpy!
The function i'm using more is
MASS
, and i am trying to understand how does it work more deeply. I hope this question is not dumb, but there is something i don't understand about how much does it take to compute the distance profile using MASS.So here i have a very simple example:
The output of this function will be something around 1.68-1.71 seconds, at least for my PC. That should mean that the function took around 1.68 seconds to execute. I'm assuming that it's
mass()
taking most of the time, since i'm not doing any other computation.If, instead of using
mass()
only one time, i will use it more times, like below (i'm using the same data for every use of the function, but i tried to do the same with differentPattern
arrays):The execution time of the function will be again
1.70
. My question is the following: shouldn't the execution time be1.68/1.70 + 1.68 + 1.68 + 1.68
, since i'm running it four times? Again, i'm sorry if i'm saying something very uncorrect here, i was just curious about this, since i'm creating a script that should usemass()
multiple times, to check different patterns on the same target dataset. Thanks in advance!Beta Was this translation helpful? Give feedback.
All reactions