-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: revise top-level package description #2430
Changes from 7 commits
2df8de5
ca8aa33
40854dd
9901f1e
253baf9
6f84e5a
ec11b01
96ac31d
085a5dd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,19 +2,33 @@ xarray: N-D labeled arrays and datasets in Python | |
================================================= | ||
|
||
**xarray** (formerly **xray**) is an open source project and Python package | ||
that aims to bring the labeled data power of pandas_ to the physical sciences, | ||
by providing N-dimensional variants of the core pandas data structures. | ||
|
||
Our goal is to provide a pandas-like and pandas-compatible toolkit for | ||
analytics on multi-dimensional arrays, rather than the tabular data for which | ||
pandas excels. Our approach adopts the `Common Data Model`_ for self- | ||
describing scientific data in widespread use in the Earth sciences: | ||
``xarray.Dataset`` is an in-memory representation of a netCDF file. | ||
|
||
that makes working with labelled multi-dimensional arrays simple, | ||
efficient, and fun! | ||
|
||
Multi-dimensional (a.k.a. N-dimensional, ND) arrays (somtimes called "tensors") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
are an essential part of computational science. | ||
They are encountered in a wide range of fields, including physics, astronomy, | ||
geoscience, bioinformatics, engineering, finance, and deep learning. | ||
In python, numpy_ provides the fundamental data structure and API for | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: |
||
working with raw ND arrays. | ||
However, real-world datasets are usually more than just raw numbers; | ||
they have "labels" which encode information about how the array values map | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure we need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed |
||
to locations in space, time, etc. | ||
By introducing the concepts of *dimensions*, *coordinates*, and *attributes* | ||
on top of raw numpy-like arrays, | ||
xarray is able to understand these labels and use them to provide a | ||
more intuitive, more concise, and less error-prone experience. | ||
Xarray also provides a large and growing library of functions for advanced | ||
analytics and visualization with these data structures. | ||
Xarray was inspired by and borrows heavily from pandas_, a highly popular data | ||
analysis package focused on labelled tabular data. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to still see the words "netCDF" somewhere (or maybe that's implicit in our mentioning of the "Common Data Model"?). Roughly speaking we have three audiences here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed it in response to @alexamici's comments. But in retrospect I agree that it belongs there. (I personally had never heard of CDM before xarray.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prioritize mentioning netCDF over the CDM and maybe drop CDM entirely from the brief intro. I don't think many people know what the "common data model" refers to, and worse it seems to be a heavily overloaded term, even in technical contexts (e.g., the top hit from Google is something unrelated from Microsoft). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This seems key enough that I might even put this somewhere in the docs? and
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might be good stuff to add to the “Why xarray” page. |
||
Xarray can read and write data from most common labeled ND-array storage | ||
formats and is particularly tailored to working with netCDF_ files, whose | ||
data model is nearly identical to xarray's. | ||
|
||
.. _numpy: http://www.numpy.org/ | ||
.. _pandas: http://pandas.pydata.org | ||
.. _Common Data Model: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM | ||
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf | ||
.. _OPeNDAP: http://www.opendap.org/ | ||
|
||
Documentation | ||
------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer can we drop the reference to xray? The set of people that know the old xray and don't know the new xarray name is probably next to empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, just today in the twitter thread under discussion, someone referenced xray and linked to the v0.2 documentation. 🤦♂️