Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging of the station records at each site including historical stations #246

Closed
BaptisteVandecrux opened this issue May 24, 2024 · 1 comment · Fixed by #252
Closed

Comments

@BaptisteVandecrux
Copy link
Member

In a level_4 folder, having one merged record for each site, combining historical, v2 and v3 stations as well as moved stations (e.g. THU_U replaced by THU_U2). Ongoing implementation in https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/join_l4/src/pypromice/process/join_l4.py with some updates in other files (main...join_l4).

It uses is a list of the latest stations (as keys) and old stations in reverse chronological order:

old_name = {
'CEN2': ['CEN1', 'GITS'],
'CP1': ['CrawfordPoint1'],
'DY2': ['DYE-2'],
'JAR': ['JAR1'],
'HUM': ['Humboldt'],
'NAU': ['NASA-U'],
'NAE': ['NASA-E'],
'NEM': ['NEEM'],
'NSE': ['NASA-SE'],
'EGP': ['EastGRIP'],
'SDL': ['Saddle'],
'SDM': ['SouthDome'],
'SWC': ['SwissCamp', 'SwissCamp10m'],
'TUN': ['Tunu-N'],
'QAS_Uv3': ['QAS_U'],
'QAS_Mv3': ['QAS_M'],
'QAS_Lv3': ['QAS_L'],
'KAN_Lv3': ['KAN_L'],
'KPC_Uv3': ['KPC_U'],
'KPC_Lv3': ['KPC_L'],
'NUK_Uv3': ['NUK_U'],
'THU_U2': ['THU_U'],
}

At the moment join_l4 is called on the same list of stations as join_l3, meaning sites for which new transmission, new raw files or new flags have recently been added:
https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/blob/b0d52ecf9427b204460f21f110ef0e049d0c49c4/l3_processor.sh#L173-L185

If a station is listed in old_name .values() (names in brackets in old_name ) then it is not processed by join_l4 (because appended to another AWS data). If a station is not in old_name.keys() then there's no historical data that needs to be appended and it is copied, as-is to the level_4 folder.

For the historical GC-Net stations, the aliases for variables are defined in an external file src/pypromice/process/variable_aliases_GC-Net.csv also defined as package data.

The merging is done by time slices:

ds1 = xr.concat((ds2.sel(
time=slice(ds2.time.isel(time=0),
ds1.time.isel(time=0))
), ds1), dim='time')

where ds1 is the current AWS data and ds2 is the historical AWS data being appended before the start of ds1.
Gap-filling during the overlapping period is currently not implemented.

The result are files of identical format and same variables as the level_3 files.

Instead of stid there is now a site_id and list_station_id attributes defined as:

site_id = n1.replace('v3','').replace('CEN2','CEN')
for l in [l3_h, l3_d, l3_m]:
l.attrs['site_id'] = site_id
l.attrs['station_id'] = site_id
if n1 in old_name.keys():
l.attrs['list_station_id'] = '('+n1+', '+', '.join(old_name[n1])+')'
else:
l.attrs['list_station_id'] = '('+n1+')'

meaning that we drop the the v3 and the 2 in CEN2 (and potentially other stations)

Right now, because of the parallel call to join_l4, join_l4 cannot know that it needs to re-append a given site (e.g. CEN) if the older station data (e.g. CEN1) is updated but not the latest station (e.g. CEN2).

@BaptisteVandecrux BaptisteVandecrux linked a pull request Jun 4, 2024 that will close this issue
@BaptisteVandecrux
Copy link
Member Author

fixed in #294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant