Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with EOBS data #9

Open
larsbuntemeyer opened this issue Nov 21, 2024 · 11 comments
Open

issues with EOBS data #9

larsbuntemeyer opened this issue Nov 21, 2024 · 11 comments
Labels
help wanted Extra attention is needed

Comments

@larsbuntemeyer
Copy link
Contributor

larsbuntemeyer commented Nov 21, 2024

There are some known issues with EOBS data that we have to deal with, e.g., there are some regions that have only limited observations and might have to be skipped for monthly and sesaonal means. We have some experience with it. Pinging @paindeer since you have worked a lot with EOBS data to evaluate REMO ERA5 output. I could not find any details in https://doi.org/10.5194/gmd-7-1297-2014 about that.

@larsbuntemeyer
Copy link
Contributor Author

Here is the fraction of eobs missing values from years 1980 to 2020:

import xarray as xr
from dask.distributed import Client

client = Client(dashboard_address="localhost:8787")

store = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/pangeo-forge/EOBS-feedstock/eobs-tg-tn-tx-rr-hu-pp.zarr"
ds =  xr.open_dataset(store, engine="zarr", chunks={}).sel(time=slice("1980", "2020"))


def sum_nan(da):
    return da.isnull().sum(dim="time") / da.time.size

%time tg_nan = sum_nan(ds.tg).compute()
%time pr_nan = sum_nan(ds.pp).compute()
CPU times: user 4.18 s, sys: 671 ms, total: 4.86 s
Wall time: 27.8 s
CPU times: user 2.33 s, sys: 205 ms, total: 2.54 s
Wall time: 9.47 s
tg_nan.plot()

grafik

pr_nan.plot()

grafik

There are definitely some regions to take care of when computing monthly or seasonal means.

@larsbuntemeyer larsbuntemeyer changed the title issues with EOBS datat issues with EOBS data Nov 21, 2024
@larsbuntemeyer larsbuntemeyer moved this to Backlog in Joint Evaluation Nov 21, 2024
@larsbuntemeyer
Copy link
Contributor Author

larsbuntemeyer commented Nov 21, 2024

This is the issue showing up in seasonal means of surface temperature:
Image

@larsbuntemeyer larsbuntemeyer pinned this issue Nov 22, 2024
@larsbuntemeyer larsbuntemeyer added the help wanted Extra attention is needed label Dec 8, 2024
@larsbuntemeyer larsbuntemeyer unpinned this issue Dec 8, 2024
@larsbuntemeyer larsbuntemeyer pinned this issue Dec 8, 2024
@JavierDiezSierra
Copy link
Collaborator

@larsbuntemeyer Just in case it might be helpful, for the Copernicus Atlas [1], we generated a mask for E-OBS to exclude areas with fewer number of stations or where the stations were not continuous over time. It was created manually, based on simple visual prospections...

[1] https://atlas.climate.copernicus.eu/atlas/gkOVTgS5

@larsbuntemeyer
Copy link
Contributor Author

Thanks @JavierDiezSierra , that looks reasonable. I think, if we want to look at 1980-2020, we could be a little more relaxed. I would image some criteria, where the time range of interest should not contain more than 10% (or any other threshold we can define) of missing values, othewise, it's masked out.

@larsbuntemeyer
Copy link
Contributor Author

Here is the issue showing up in the tas bias:
grafik

@JavierDiezSierra
Copy link
Collaborator

@larsbuntemeyer Yes, I agree that following a threshold criterion is less problematic. I think eliminating gridcells with more than10% missing values is appropriate :)

I will try to include CORDEX-CMIP5 spatial map biases with respect to eobs in the coming days.

@jesusff
Copy link
Contributor

jesusff commented Jan 14, 2025

yes, we definitely need to apply a mask. We can start with the 10% threshold. The changes are quite sharp, so the only difference when moving the threshold will be (for pr) whether we include the circle around the eastern Mediterranean or not.
image

@gnikulin
Copy link

The mask from the Copernicus atlas looks better (more consistent) for me, at least visually. I would only add all missing parts of the EU countries, e.g. southern Greece, Rhodes, Cyprus, Sicily with some clarifications in the paper. Northern Africa is not in the focus and can be excluded. The eastern Mediterranean is also not in the focus, although this region can be left.

By the way, what version of E-OBS is used here ?

@JavierDiezSierra
Copy link
Collaborator

@gnikulin If I'm right, we are using version 23.1 at 0.1 (https://catalog.leap.columbia.edu/feedstock/eobs-dataset), but version 30.e is already available and has been dowloaded in the JSC server (/mnt/CORDEX_CMIP6_tmp/aux_data/eobs). @larsbuntemeyer, why don't we use the latest verion of E-OBS?

In Kotlarski (2014), they regridded EUR-11 to the 0.22 E-OBS grid for the period 1989-2008, but we are regridding both projections and E-OBS to the rotated reference grid for EUR-11.

This is an example of the bias for CORDEX-CMIP5 models for the period 1989-2008 on the EUR-11 mesh. It reproduces Figure 2 from Kotlarski. The results are very similar, with slightly differences.

@jesusff

CMIP5_eobs_tas-4.pdf

@larsbuntemeyer
Copy link
Contributor Author

Yes, agreed, it's not the latest version, but it is easily accessible without having to download more data to the filesystem. I'll update the workflow to use the latest version.

@larsbuntemeyer
Copy link
Contributor Author

larsbuntemeyer commented Jan 28, 2025

As far as i see, the latest eobs is not available on 0.22 grid anymore, so i think the approach to use the EUR-11 grid for comparison seems reasonable (also lambert conformal models are regridded to EUR-11 rotated grid).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Status: Backlog
Development

No branches or pull requests

4 participants