How to Track Execution Times (Performance Timing) - MASS #218

Jacks349 · 2020-07-23T11:25:31Z

Jacks349
Jul 23, 2020

I've been using Stumpy for a while now to detect patterns on a target dataset using a pattern specified by me, and i'm getting very good results! Step by step i'm understanding more how to find what i'm looking for, even when i'm using very small datasets, so kudos to Stumpy!

The function i'm using more is MASS, and i am trying to understand how does it work more deeply. I hope this question is not dumb, but there is something i don't understand about how much does it take to compute the distance profile using MASS.

So here i have a very simple example:

def Exec():
	Pattern = [1, 13, 10, 13.4, 10.5, 12.6]

	Target = [3.92e-06, 4.02e-06, 4.02e-06, 4.02e-06, 4.02e-06, 3.95e-06, 3.99e-06, 3.96e-06, 4.03e-06, 4.32e-06, 4.33e-06, 
			4.15e-06, 4.26e-06, 4.26e-06, 4.17e-06, 4.29e-06, 4.29e-06, 4.22e-06, 4.54e-06, 4.34e-06, 5.69e-06, 5.27e-06, 
			4.99e-06, 5.9e-06, 5.53e-06, 5.71e-06, 5.3e-06, 5.48e-06, 5.29e-06, 5.48e-06, 5.23e-06, 5.35e-06, 5.15e-06, 
			6.28e-06, 5.89e-06, 6.3e-06, 5.88e-06, 5.89e-06, 6.35e-06, 6.6e-06, 6.25e-06, 6.25e-06, 6.37e-06, 6.19e-06, 
			7e-06, 6.63e-06, 7.16e-06, 6.58e-06, 6.94e-06, 6.79e-06, 6.94e-06, 6.74e-06, 6.9e-06]

	Pattern = np.array(Pattern, dtype="float32")
	Target = np.array(Target, dtype="float32")

	D1 = mass(Pattern, Target)

start = time.time()
Exec()
end = time.time()
print('ELAPSED TIME')
print(end - start)

The output of this function will be something around 1.68-1.71 seconds, at least for my PC. That should mean that the function took around 1.68 seconds to execute. I'm assuming that it's mass() taking most of the time, since i'm not doing any other computation.

If, instead of using mass() only one time, i will use it more times, like below (i'm using the same data for every use of the function, but i tried to do the same with different Pattern arrays):

D1 = mass(Pattern, Target)
D1 = mass(Pattern, Target)
D1 = mass(Pattern, Target)
D1 = mass(Pattern, Target)

The execution time of the function will be again 1.70. My question is the following: shouldn't the execution time be 1.68/1.70 + 1.68 + 1.68 + 1.68, since i'm running it four times? Again, i'm sorry if i'm saying something very uncorrect here, i was just curious about this, since i'm creating a script that should use mass() multiple times, to check different patterns on the same target dataset. Thanks in advance!

Jacks349 · 2020-07-23T13:21:54Z

Jacks349
Jul 23, 2020
Author

@seanlaw I'm sorry! When i submitted the question i got a 404 error from Github, so i thought the question was not created, so i submitted again. My bad!

0 replies

seanlaw · 2020-07-23T14:12:50Z

seanlaw
Jul 23, 2020
Maintainer

I've been using Stumpy for a while now to detect patterns on a target dataset using a pattern specified by me, and i'm getting very good results! Step by step i'm understanding more how to find what i'm looking for, even when i'm using very small datasets, so kudos to Stumpy!

That's great to hear, @Jacks349! We also welcome any constructive criticism as well. We are here to learn how to best serve our community.

The execution time of the function will be again 1.70. My question is the following: shouldn't the execution time be 1.68/1.70 + 1.68 + 1.68 + 1.68, since i'm running it four times? Again, i'm sorry if i'm saying something very uncorrect here, i was just curious about this, since i'm creating a script that should use mass() multiple times, to check different patterns on the same target dataset. Thanks in advance!

This is a great question! There is a very subtle explanation for this that may seem mysterious at first but it can all be explained by the fact that STUMPY uses the wonderful Numba Python package for fast numerical computation. You'll notice that STUMPY has only three core dependencies (NumPy, SciPy, and Numba). Essentially, Numba takes Python code and performs a "just-in-time" (JIT) compilation of that code into performant and parallelized machine code. However, the "JIT" part means that a function is only compiled when that function is called by the user (i.e., if you don't call the other STUMPY functions then those won't get compiled). This is quite different from traditional compiled languages like C, C++, or Java where the programmer has to manually compile the code first before executing. With Numba, at first execution, the code is automatically compiled at run-time before it is executed but it doesn't need to re-compile there after. As you can imagine, this compilation time (when first called) takes a little bit of time/overhead but, after the first call, it is purely compute time. So, what you are seeing in the 1.68 seconds is both compile time AND compute time. After that, since your data is so small and it no longer has to compile the function, the compute time is basically instantaneous (likely in the sub milliseconds). Thus, when performing any timing comparisons with STUMPY, what you'll want to do is just run the code once on a super small toy data set just so it forces compilation and then, after that, perform your timing calculation. Or, in other words, ignore the first run.

Let me know if that makes sense or if I can clarify my words in any way.

0 replies

Jacks349 · 2020-07-23T14:36:53Z

Jacks349
Jul 23, 2020
Author

@seanlaw
This is extremely interesting; i don't have much knowledge but i suspected from pure observation that there was a similar mechanism, it reminds me of caching somehow. So basically it means that even if i execute a lot of times mass(), the execution time won't increase a lot? Also, i didn't know about the "trick" you mentioned, about executing a first operation on toy data. It will be very useful since i'm trying to decrease as much as possible execution times of my application. Thanks a lot!

0 replies

Jacks349 · 2020-07-23T14:40:24Z

Jacks349
Jul 23, 2020
Author

If i understood correctly: i have a script constantly running that receives data from a third party, and every time the data is received i want to use mass(). What i could do is to execute a "toy" operation when starting the script, so that it won't have to compile later. Very interesting!

0 replies

seanlaw · 2020-07-23T14:51:07Z

seanlaw
Jul 23, 2020
Maintainer

This is extremely interesting; i don't have much knowledge but i suspected from pure observation that there was a similar mechanism, it reminds me of caching somehow.

It may feel like caching but there is a difference in that caching (in my mind) applies to the caching of data in memory. So, if you execute the same function with the same data then it can be faster. In our case, if you change the data (so there is no data caching) but keep the size of the data the same then the execution time should be nearly identical. So, we are "caching" compilation (I use this term loosely as I'm sure it's not correct), not data. Though, there may be some level of data caching that is happening that I'm not aware of...

0 replies

Jacks349 · 2020-07-23T15:11:35Z

Jacks349
Jul 23, 2020
Author

Thank you for taking your time to make me understand this topic, it's perfectly clear now!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Track Execution Times (Performance Timing) - MASS #218

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to Track Execution Times (Performance Timing) - MASS #218

Jacks349 Jul 23, 2020

Replies: 6 comments

Jacks349 Jul 23, 2020 Author

seanlaw Jul 23, 2020 Maintainer

Jacks349 Jul 23, 2020 Author

Jacks349 Jul 23, 2020 Author

seanlaw Jul 23, 2020 Maintainer

Jacks349 Jul 23, 2020 Author

Jacks349
Jul 23, 2020

Jacks349
Jul 23, 2020
Author

seanlaw
Jul 23, 2020
Maintainer

Jacks349
Jul 23, 2020
Author

Jacks349
Jul 23, 2020
Author

seanlaw
Jul 23, 2020
Maintainer

Jacks349
Jul 23, 2020
Author