Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring #95

Merged
merged 43 commits into from
Jul 2, 2023
Merged

Refactoring #95

merged 43 commits into from
Jul 2, 2023

Conversation

ctuguinay
Copy link
Collaborator

@ctuguinay ctuguinay commented Jun 15, 2023

Refactoring

This PR addresses the following issues: #91, #88, #83, #43.

The main folders are now:

  • utils
  • lines
  • regions2d

As opposed to before:

  • utils
  • convert
  • formats
  • mask
  • plot

Refactoring notes:

  • In both regions2d and lines, plotting is still kept as a class function: r2d.plot() and lines.plot().
  • In both regions2d and lines, masking is kept outside of class: regions2d_mask(r2d,....) and lines_mask(lines,.....).
  • In both regions2d and lines, parsing is kept outside of class: parse_regions_file(input_file) and parse_lines_file(input_file).
  • We now always initialize regions2d and lines with appropriate .evr and .evl functions at initialization ALWAYS. This is enforced with an error if not initialized properly (incorrect .evr or .evl file).

Other Changes Include:

  • Added line masking based on Valentina's example notebook with options for what to do with NA values and what interpolation method to use.

All tests currently pass.

@leewujung @valentina-s Feel free to review this when you are free.

ctuguinay and others added 16 commits May 31, 2023 09:52
…s-from-pandas-=2.0#79

Update requirements and fix tests from pandas =2.0#79
* removed parse scaffolding to the extent where we only initialize a parser, in the initialization of the parser we set parser.data class parameter, and then assign format class data class parameter
* we always parse file now
* minimize offset code to what is necessary
* enforce that input file (in parsers) is not none
* reduce spacing and removing unused imports
* may lead to lint errors
* ignore no gridpoint belongs to any region since we want this behavior to occur
* remove offset functionality in regions2d
* remove offset from r2d test
* remove offset from api
* remove offset from region_mask
* refactored main classes into lines and regions2d directories
* removed get points from region (unused)
* separate utils into time based and io based functions
* turn plot and mask into functions for both lines and regions2d
* refactored regions2d_mask
* added lines_mask functionality
* add class sonar for nc data storage
* sonar does parsing of nc files and checks inside dataset
* does value setting checking
* implement sonar for lines_mask and regions2d_mask
* update lines_mask and regions2d_mask tests
@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #95 (cf7f15d) into main (4a7bf7e) will decrease coverage by 4.50%.
The diff coverage is 79.31%.

@@            Coverage Diff             @@
##             main      #95      +/-   ##
==========================================
- Coverage   80.72%   76.22%   -4.50%     
==========================================
  Files          15       13       -2     
  Lines         555      408     -147     
==========================================
- Hits          448      311     -137     
+ Misses        107       97      -10     
Impacted Files Coverage Δ
echoregions/utils/io.py 44.68% <44.68%> (ø)
echoregions/regions2d/regions2d.py 75.91% <70.27%> (ø)
echoregions/lines/lines.py 88.23% <88.23%> (ø)
echoregions/regions2d/regions2d_parser.py 95.65% <95.65%> (ø)
echoregions/__init__.py 100.00% <100.00%> (ø)
echoregions/core.py 100.00% <100.00%> (ø)
echoregions/lines/__init__.py 100.00% <100.00%> (ø)
echoregions/lines/lines_parser.py 100.00% <100.00%> (ø)
echoregions/regions2d/__init__.py 100.00% <100.00%> (ø)
echoregions/utils/__init__.py 100.00% <100.00%> (ø)
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ctuguinay and others added 12 commits June 20, 2023 15:39
* for further sonar data tests
* add option for method and limit area for lines mask
* needed to parse with xr.open_dataset and xr.open_zarr and these are user needed functions
* refactor read_nc to read_sonar
* add ability to read zarr files in  read_sonar
* in reading zarr files manipulate echo range and water level to get depth values and append these values to Sv (this may be non-functional, must test on more examples)
* nearest interpolation for lines requires scipy
* make processing for zarr and nc files cleaner and more in sync with each other by removing unnecessary variables and coordinates for masking
* expect input data arrays to be of "good" masking quality so there will be no functionality checking for "good" quality of data
* slight change in r2d masking where we fillnas with max depth both before and after interpolation
Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ctuguinay : Thanks for the refactoring work! I am done with this round of review. All comments are either in my overall comment last night or inline with the code.

One last thing is the location of read_evr and read_evl. Could you move them out of utils and put them into core.py at root level? (echoregions/echoregions/core.py) This way we can easily find them, and init is easy. We may decide to move them later to another module, but for now let's keep it simple.

ctuguinay and others added 8 commits June 26, 2023 13:02
* send read_evl and read_evr to core.py from utils.py for easier reading
* simplified lines and regions2d init files to only include main class
* simplified echoregions main init to just read_evl and read_evr (we work with the lines and regions2d objects for every functionality)
* remove nan_depth_value property and setter
* create appropriate lambda function to replace replace_depth
* place lines mask in lines class as a class function
* remove checking for ._data type (it is hidden and is set by code so we know it is dataframe)
* separate start time and end time checking
* revise the usage of lines_mask in tests so that we have just lines.mask(...)
* move regions2d mask to regions2d class as a class method
* revise tests accordingly to have that r2d.mask(...) when we want to use a mask
* remove depth
* remove property and setter functions for max_depth and min_depth
* remove checking for .data type
* catch invalid method and limit area values prior to interpolation
* interpolation try except block is for errors that may not arise from the explicit value of method and limit area
@ctuguinay
Copy link
Collaborator Author

ctuguinay commented Jun 26, 2023

@leewujung Sorry for how long this comment is. I mostly just wanted to keep all these replies to your comments in one place, for easier access later on.

1

I wonder if we can replace the whole replace_depth/swap_val thing by a lambda function, since what it does is to swap out -10000.99 with self.nan_depth_value.

Can also define ECHOVIEW_NAN_DEPTH_VALUE = -10000.99 near the top of the script and use this constant in place of the value, so that it is easier to keep track of numbers like this flying around.

See example here: https://github.com/OSOceanAcoustics/echopype/blob/9b401ca8250e60c7264b274e66145d0f7842def6/echopype/echodata/combine.py#L20-L26

I addressed both of these in the latest push.

2

I don't see a lot of value in keeping nan_depth_value as a property -- what do you think about removing it? Also curious why you decided to have .data as a way to access ._data -- is it for potential future changes?

True, no reason keeping nan_depth_value as a property; I've removed it as a property (and so I had to remove its setter too) in the latest push. Also .data feels in sync with how xarray's data is accessed.

since ._data is a private attribute, it seems having the type checking here is redundant and somewhat strange, since the type is set by the programmer

True, I think I initially wrote this prior to making it a private attribute. I removed it in latest push.

3

self.depth, self._min_depth, self._max_depth are not used anywhere in the Line class, so I am a little unsure why they are here. Am I missing anything?

Yup, removed them in latest push. Initially moved them here since they were inherited by a Geometry class that I deleted.

4

better to check each of these separately so the error message can be more specific.

Just separated them, can be found in latest push.

5

I think the ones imported in this root level __init__.py are the one that users will use directly, such as doing

test_line = read_evl(SOME_EVL)
test_region = read_evr(SOME_EVR)

So I'd suggest to keep only the ones below:
.....

We probably should discuss again later (but soon) if we are happy with the way lines_mask and regions2d_mask are.

Since I am placing both masking functions as class methods, I don't think they need to be here. So the only functions needed are read_evl and read_evr. I made this change and it can be found in the latest push.

6

could remove parse_line_file since that is not used outside of this module.

could remove parse_regions_file since it is not used outside of this module.

I addressed both of these in the latest push. For the reason listed above, I removed both mask functions too.

7

I think we can remove self.depth to not have a "hidden" thing for setting the max and min depth. Similar to the case in Lines, I also wonder if we can simply keep min_depth and max_depth as attributes and not properties to make things simpler, since it is unlikely these values will be set by some complicated method.

Just removed self.depth and removed min_depth and max_depth as a property. Also I do agree with you that they would be more appropriate as an attribute. I will still keep the setter methods because of problems that may arise in initialization. These changes can be found in latest push.

8

Similar to the Lines case, I don't think we need to check the type of self.data here since it is set internally.

Just removed this. Can be found in latest push.

9

I am really confused by this and convert_points -- I don't see it used anywhere in the repo. Not sure if I am missing anything though. The use of 9999.99 and -9999.99 suggests that there's some redundancy with replace_nan_depth. Do we need to keep the convert_points function?

I will put this in a new issue and will reference it in a bit. I don't see it used anywhere, but it was created for a reason. It is related to #53, and as such may be used there.

10

  • For Regions2d.replace_nan_depth: looking at its use regions_mask, I think there is some bug there -- when inplace=True, self.data should be updated and nothing is returned. When inplace=False, this method can return a new dataframe from self.data. Am I missing anything here? -- Feel free to make this a separate issue to be addressed later so it does not clog up this refactoring PR.

I'll place this in a separate issue to investigate further and mention this PR when I’ve created it. But yes, I think you're right in the sense that nothing should be returned if true, but in this case, if true it will return something instead of changing it inplace.

11

  • The input argument region in select_region and a few other methods seems would be better renamed as region_id since it is effectively used to subset the dataframe with the column region_id

I address this in #97, so I'll leave it in there as it's more relevant for that PR.

12

  • The input argument files in select_sonar_file seems would be better renamed as sonar_filenames to avoid confusion

Just changed this. Can be found in latest push.

13

We want this to be a method Regions2d.mask, so the regions2d input argument would just be the self. Note the replace_nan_depth has a bug in doing inplace=True that needs to get fixed so that the nan_depth is replaced in self.data (I have this in my previous comment).

So effectively we will not have a separate regions2d_mask.py and will access this method via the Regions2d objects.

Similar to the regions2d case, we want this to be a method Lines.mask and not as a separate function.

Just made them class functions. Can check the latest push.

14

This entire block can be simplified since the M = r.mask(...) section is identical. The code will be much cleaner if it starts with a section for creating r, going into M = r.mask(...), and then some conversion from labels to mask_labels if mask_labels is a list.

Due to how large this PR is, I'll place this in a separate issue and mention this PR when I’ve created it.

15

I think one can do reindex directly to sonar_index without creating the joint_index first?

Not sure why, but reindexing directly to sonar_index led to key not found errors, I did a preliminary search, but didn’t find anything that would cause this discrepancy other than an index reset after .loc[...]. Do you want me to investigate this further, or just leave it?

16

remove print("")?

Probably a good idea to catch the invalid method and limit_area explicitly before the try-catch block, and use this block to catch other unexpected things. There's sort of a mismatch here since the function only allows method and limit_area but interpolate allows many more arguments, so the error string should also be revised. Probably better to just directly raise the error from interpolate.

Just changed this code. Can be found in latest push.

17

For cloud compatibility (see #69) we would want this changed to something that is more comprehensive. The same validate function in echopype probably can just be grabbed directly without much change to work here. Could you please make a note in the #69 comment so that we do not forget about this?

Just made a note!

18

Also I think the content of this function should be cleaned up (the logic and also to check filename extension properly with pathlib or properly extracting the .XXX part instead of endwith; pathlib does not work for cloud path last time I checked so maybe some simple re use or similar). Could you put this as a separate issue?

I will place this in a separate issue and mention this PR when I’ve created it.

19

One last thing is the location of read_evr and read_evl. Could you move them out of utils and put them into core.py at root level? (echoregions/echoregions/core.py) This way we can easily find them, and init is easy. We may decide to move them later to another module, but for now let's keep it simple.

Just moved them. Can be found in latest push.

@leewujung
Copy link
Member

I think one can do reindex directly to sonar_index without creating the joint_index first?

Not sure why, but reindexing directly to sonar_index led to key not found errors, I did a preliminary search, but didn’t find anything that would cause this discrepancy other than an index reset after .loc[...]. Do you want me to investigate this further, or just leave it?

Could you put in a TODO comment in the code to say what you have here, and we can potentially investigate later? It runs through fine so no reason for this to be a blocker. Thanks!

Comment on lines 160 to 166
Arguments:
sonar - Sonar object containing DataArray of shape (ping_time, depth).
method - String containing interpolation method.
limit_area - String for determining filling restriction for NA values.

Returns:
bottom_mask - xarray with dimensions: (ping_time, depth) with bottom: False, otherwise: True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also change this to numpy docstring format to be consistent.

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctuguinay : Thanks for making the changes and adding issues for the various pieces.

I only have a minor comment about docstring in one of the mask functions, but other than that, feel free to merge this PR!

* revised lines mask docstring to match numpy docstring format
* add todo to investigate reindex/loc error
@ctuguinay ctuguinay merged commit cdc75c1 into OSOceanAcoustics:main Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants