Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: revise top-level package description #2430

Merged
merged 9 commits into from
Jan 6, 2019
Merged
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 25 additions & 11 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,33 @@ xarray: N-D labeled arrays and datasets in Python
=================================================

**xarray** (formerly **xray**) is an open source project and Python package
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer can we drop the reference to xray? The set of people that know the old xray and don't know the new xarray name is probably next to empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, just today in the twitter thread under discussion, someone referenced xray and linked to the v0.2 documentation. 🤦‍♂️

that aims to bring the labeled data power of pandas_ to the physical sciences,
by providing N-dimensional variants of the core pandas data structures.

Our goal is to provide a pandas-like and pandas-compatible toolkit for
analytics on multi-dimensional arrays, rather than the tabular data for which
pandas excels. Our approach adopts the `Common Data Model`_ for self-
describing scientific data in widespread use in the Earth sciences:
``xarray.Dataset`` is an in-memory representation of a netCDF file.

that makes working with labelled multi-dimensional arrays simple,
efficient, and fun!

Multi-dimensional (a.k.a. N-dimensional, ND) arrays (somtimes called "tensors")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somtimes -> sometimes

are an essential part of computational science.
They are encountered in a wide range of fields, including physics, astronomy,
geoscience, bioinformatics, engineering, finance, and deep learning.
In python, numpy_ provides the fundamental data structure and API for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: numpy -> NumPy

working with raw ND arrays.
However, real-world datasets are usually more than just raw numbers;
they have "labels" which encode information about how the array values map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need " around labels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

to locations in space, time, etc.
By introducing the concepts of *dimensions*, *coordinates*, and *attributes*
on top of raw numpy-like arrays,
xarray is able to understand these labels and use them to provide a
more intuitive, more concise, and less error-prone experience.
Xarray also provides a large and growing library of functions for advanced
analytics and visualization with these data structures.
Xarray was inspired by and borrows heavily from pandas_, a highly popular data
analysis package focused on labelled tabular data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to still see the words "netCDF" somewhere (or maybe that's implicit in our mentioning of the "Common Data Model"?).

Roughly speaking we have three audiences here:

  • NumPy users who want labels
  • pandas users who want to work with higher-dimensional data
  • netCDF users who want good in-memory data-structures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it in response to @alexamici's comments. But in retrospect I agree that it belongs there. (I personally had never heard of CDM before xarray.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prioritize mentioning netCDF over the CDM and maybe drop CDM entirely from the brief intro. I don't think many people know what the "common data model" refers to, and worse it seems to be a heavily overloaded term, even in technical contexts (e.g., the top hit from Google is something unrelated from Microsoft).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly speaking we have three audiences here:

* NumPy users who want labels

* pandas users who want to work with higher-dimensional data

* netCDF users who want good in-memory data-structures

This seems key enough that I might even put this somewhere in the docs?

and

* pandas users who want to work with higher-dimensional data
->
* pandas users who want to work with higher-dimensional data and an explicit, production-capable API

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be good stuff to add to the “Why xarray” page.

Xarray can read and write data from most common labeled ND-array storage
formats and is particularly tailored to working with netCDF_ files, whose
data model is nearly identical to xarray's.

.. _numpy: http://www.numpy.org/
.. _pandas: http://pandas.pydata.org
.. _Common Data Model: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf
.. _OPeNDAP: http://www.opendap.org/

Documentation
-------------
Expand Down