Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use KFR v6 for all ARCH #25

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Use KFR v6 for all ARCH #25

wants to merge 7 commits into from

Conversation

dubrsl
Copy link

@dubrsl dubrsl commented Feb 2, 2024

KFR made new version v6 and C API is supported in both Intel and ARM kfrlib/kfr#196 (comment)
I remember that on the forum we discussed that it would be good to get rid of NE10 and use a unified approach.

@krakenrf
Copy link
Owner

krakenrf commented Feb 23, 2024

Hey, been testing this out. Everything compiles fine, but the actual decimation doesn't seem to work at least on ARM (RPi 4). When I tried increasing the decimation to 2, and FIR to 32, the output data appears to be -inf. And hence the spectrum and DOA pages don't work. Any ideas?

@godsic
Copy link
Contributor

godsic commented Feb 24, 2024

@krakenrf It might not be related to KFR6. What I noticed during my testing is that after setting the same decimator parameters as you mentioned and clicking Reconfigure & Reset DAQ chain, I end up with VFO frequency set to zero, even though in web GUI it is displayed correctly. Clicking on that VFO frequency field and making Dash validate it again fixes the spectrum and DoA calculation. Could you please try this?

@krakenrf
Copy link
Owner

I did try refreshing and checking the VFO freq, but everything is fine there. Even if I restart the software from scratch with decimation set to 2 and FIR set to 30, it doesn't work. But the same parameters with NE10 do work.

@godsic
Copy link
Contributor

godsic commented Feb 25, 2024

@krakenrf I also noticed somewhat unexpected behavior when Heimdall-side decimation is invoked, in particular, the Data Block Length increases proportionally with Decimation Ratio. Its not just a GUI issue, but when I press Reconfigure & Reset DAQ chain I end up with the latency that reflects this behavior. Am I missing something?
image

@krakenrf
Copy link
Owner

That's expected, because when you decimate you're collecting less samples per second, to the maintain the same buffer size, the data block length increases. So the GUI is set to automatically show what the new length will be.

@godsic
Copy link
Contributor

godsic commented Feb 25, 2024

Aha, that's a bit counterintuitive for me as I was expecting CPI Size = DAQ Buffer Size / Decimation Ratio not vice versa. So to achieve the behavior I expect I must set CPI Size correspondingly. Also, I noticed that DSP-side decimation is terribly slow on my x86_64 hardware adding almost a full second with FIR order of 7. scipy is actually doing a better job. @krakenrf are you experiencing the same?

@krakenrf
Copy link
Owner

krakenrf commented Feb 27, 2024

Yeah the CPI calculation should probably be set to maintain the block length when other parameters are changed. It's also most efficient to keep CPI a power of 2 as well, so maybe that should be restricted.

Regarding DSP-side decimation, do you mean DAQ-side decimation instead? The DAQ side is handed by NE10/KFR and DSP-side handled in Python. I have noticed that Python scipy seems to be at least as efficient as these lower level decimation libraries. I can't remember the exact reason but Tamas did say there was a good reason to use these libraries for the DAQ side decimation.

Maybe @petotamas can recall the reason we need to use NE10 and KFR instead of Python scipy for DAQ side decimation?

@petotamas
Copy link
Collaborator

petotamas commented Feb 27, 2024

Hi,

As far as I remember, we did several test using a python implementation and at the end it came out that the C based implementation is faster. It was not just about the decimation but there were other circumstances.
The consideration were the followings:

  • First of all we use the filtering and decimation to improve the effective bit-width of the receiver, so we can improve the signal to noise ratio in those cases where the quantization noise limits the achievable signal quality. Improving the SNR and increasing the bit-width also means that the original 8 bit I and Q is longer enough to represent the signal, we have to increase the 8bit representation otherwise decimation is useless. Since we are performing other DSP tasks later in the processing pipeline, increasing the 8 bit to 16 may cause problem later so to avoid this we directly wanted to change the representation to floating point. As a result we have to perform a type conversion first.
  • Next we perform FIR filtering to reject the out of band noise, so the oversampling could develop its positive impact.
  • Finally we decimate the signal to go down to the minimally required sampling frequency that describes the signal.
    It is a standard workflow. Since we are going to drop samples during the decimation, we also wanted to avoid the calculation of those during the FIR filtering to save computational resources.

In the early version this fir filtering and decimation processing was implemented in the delay_synchronizer module in python using the scipy, numpy and numba libraries but the type conversion was slow, and also (I am not sure about this but as far as I can recall), we didn't find a FIR implementation that would be optimized for decimation and would be able to calculate only those samples that will be kept after the decimation.
NE10 and KFR had these capabilities and they both use SIMD instructions. Also the type conversion was faster in C so we decided beside the current implementation.

@krakenrf
Copy link
Owner

I did eventually end up finding an efficient way to do decimation with Scipy which is how decimation is implemented on the DSP side. I suspect Scipy and KFR/NE10 have very similar processing speeds when decimation is implemented efficiently with Scipy, but the type conversion could still be an additional bottleneck.

Probably should investigate if that is still the case or if we need to stick to C libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants