-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tidi #253
base: develop
Are you sure you want to change the base?
Tidi #253
Conversation
Thanks @landsito! I'll take a look this week. |
I can take a look myself this upcoming week. Thanks @landsito |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pull request! I was able to get through the listed test code. Thanks for the example. Note I tweaked the example code to increase portability and make it easier for everyone to run. I have some questions and comments to help improve details.
if inst_id == 'ncar': | ||
if tag == 'vector': | ||
data, meta = cdw.load(fnames, tag, inst_id, | ||
pandas_format=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas_format=True) | |
pandas_format=True, | |
meta_translation=meta_translation) |
@jklenzing I think we may need a meta_kwargs pass thru in cdw.load to supply the labels dict to meta.
meta_translation is also not currently supported by the pandas_format option. My suggestion may be premature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of metadata from this data set
In [39]: tidi.meta['Epochcold']
Out[39]:
FIELDNAM Epoch cold
VAR_TYPE support_data
DEPEND_0 NaN
DEPEND_1 NaN
LABL_PTR_1 NaN
DISPLAY_TYPE NaN
FORMAT NaN
LABLAXIS NaN
SCALETYP NaN
units NaN
long_name Epochcold
notes Epochcold
desc Default time
value_min 62798371200000.0
value_max 63460972800000.0
fill NaN
Name: Epochcold, dtype: object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be improved. I used a generic metadata dict and not sure how they match with the fields within the loaded TIDI files. I would need assistance for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Posted this to help with our pysat meeting conversation on the code. Thanks again for the pull!
if tag == 'vector': | ||
data, meta = cdw.load(fnames, tag, inst_id, | ||
pandas_format=True) | ||
data = data.to_xarray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the switch to xarray for data type consistency with other inst_id/tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. I tested it alsof or the NCAR tag and worked fine but hopefully could be improved.
drop_meta_labels='FILLVAL', | ||
) | ||
data.append(idata) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some issues with the 'vector' tag. I'm using a new Python install so the issue may be on my end. I can't seem to access data at the Instrument level. @jklenzing
In [16]: tidi['ut_date']
Out[16]:
<xarray.Dataset> Size: 0B
Dimensions: ()
Data variables:
*empty*
In [17]: tidi.data['ut_date']
Out[17]:
<xarray.DataArray 'ut_date' (time: 1317)> Size: 11kB
array([b'2019001', b'2019001', b'2019001', ..., b'2019002', b'2019002',
b'2019002'], dtype=object)
Coordinates:
time (time) datetime64[ns] 11kB 2019-01-01T00:03:37 ... 2019-01-02T00...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the data distribution, loading data for Jan 1 includes Jan 2, the multi_file_day
should be set for each data set like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be better solution. I did notice that some vector files contain part of the next day for different type i.e cold/warm
for t in data] | ||
ee = xr.concat(ee,'time') | ||
data = xr.merge([ff,ee,hh[0]]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the 'profile' and 'los' tag I'm getting an empty meta.data. 'profile' example shown.
In [25]: tidi.meta.data
Out[25]:
Empty DataFrame
Columns: [units, long_name, notes, desc, value_min, value_max, fill, plot, axis, scale]
Index: []
In [26]: tidi.data
Out[26]:
<xarray.Dataset> Size: 1MB
Dimensions: (time: 1880, alt: 21)
Coordinates:
time (time) datetime64[ns] 15kB 2019-01-01T00:08:01 ... 2019-0...
alt (alt) float32 84B 70.0 72.5 75.0 77.5 ... 115.0 117.5 120.0
Data variables: (12/46)
ms_time (time) float32 8kB 695.0 280.0 925.0 ... 190.0 845.0 500.0
ut_date (time) object 15kB b'2019001' b'2019001' ... b'2019001'
ut_time (time) float64 15kB 4.66e+05 4.69e+05 ... 8.638e+07
rec_index (time) float64 15kB 3.0 1.0 2.0 ... 1.88e+03 1.879e+03
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course if the files themselves have no metadata that is ok
'min_val': ('Valid_Min', np.float64), | ||
'max_val': ('Valid_Max', np.float64), | ||
'fill_val': ('fill', np.float64)} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aburrell There are a few repeated type_mistmatch issues related to metadata and labels. One of the variables is unlike the others...
/Users/russellstoneback/Code/pysat/pysat/_meta.py:440: UserWarning: Metadata with type <class 'str'> does not match expected type <class 'numpy.float64'>. Dropping input for 'ut_date' with key 'Valid_Min'
warnings.warn(''.join((
/Users/russellstoneback/Code/pysat/pysat/_meta.py:440: UserWarning: Metadata with type <class 'str'> does not match expected type <class 'numpy.float64'>. Dropping input for 'ut_date' with key 'Valid_Max'
warnings.warn(''.join((
Is there a way to address these warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the recent versions should auto-identify the types. Try running without the labels.
elif tag == 'los': | ||
for i,idata in enumerate(data): | ||
idata = idata.assign_coords(time=idata.time) | ||
data[i] = idata.rename(nlos='time') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line, and similar, produces a warning on my system:
/Users/russellstoneback/Code/pysatNASA/pysatNASA/instruments/timed_tidi.py:205: UserWarning: rename 'nlos' to 'time' does not create an index anymore. Try using swap_dims instead or use set_index after rename to create an indexed coordinate.
data[i] = idata.rename(nlos='time')
@@ -18,11 +18,13 @@ | |||
""" | |||
|
|||
import datetime as dt | |||
import gzip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in the standard library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is included into python core as far as I understand https://docs.python.org/3/library/gzip.html
@@ -645,6 +647,11 @@ def _get_file(remote_file, data_path, fname, temp_path=None, zip_method=None): | |||
if zip_method == 'zip': | |||
with zipfile.ZipFile(dl_fname, 'r') as open_zip: | |||
open_zip.extractall(data_path) | |||
elif zip_method == 'gz': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the TIDI files zipped? I don't see a 'zip_method' assigned to the instrument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the zip method assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tidi files are zipped with zlib module.
@@ -645,6 +647,11 @@ def _get_file(remote_file, data_path, fname, temp_path=None, zip_method=None): | |||
if zip_method == 'zip': | |||
with zipfile.ZipFile(dl_fname, 'r') as open_zip: | |||
open_zip.extractall(data_path) | |||
elif zip_method == 'gz': | |||
dest = os.path.join(data_path, fname.replace('.gz','')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the functionality or issue that gzip is handling that zipfile package does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I tried using the zip package to unzip the gzip files but I wasn't able to since these are for other compression library, but if you could take a look would be best.
Co-authored-by: Russell Stoneback <[email protected]>
Description
Addresses #(issue)
Adding support for TIMED-TIDI data.
Datasets available: Michigan and NCAR sets. Both available at CDAweb/SPDF
Data level available: Level 1, 2 and 3 for Michigan dataset and Level 3 for NCAR set
Added support for compressed files using gzip compression, needed for these datasets.
How Has This Been Tested?
also
Test Configuration:
Checklist:
develop
(notmain
) branchCHANGELOG.md
, summarizing the changesIf this is a release PR, replace the first item of the above checklist with the release
checklist on the wiki: https://github.com/pysat/pysat/wiki/Checklist-for-Release