Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate changes to workshop materials from old repo #493

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 33 additions & 4 deletions docs/tutorial/1_astropy_and_sunpy.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,35 @@ Using the `.to()` method on a `u.Quantity` object lets you convert a quantity to
speed.to(u.km/u.h)
```

### Equivalencies

Some conversions are not done by a conversion factor as between miles and kilometers – for example converting between wavelength and frequency:

```{code-cell} python
---
tags: [raises-exception]
---
(656.281 * u.nm).to(u.Hz) # Fails because they are not compatible
```

However we can make use of a spectral *equivalency* to indicate the link between the units:

```{code-cell} python
(656.281 * u.nm).to(u.Hz, equivalencies=u.spectral())
```

### Constants

The `astropy.constants` sub-package provides a set of physical constants which are compatible with the units/quantities framework:

```{code-cell} python
from astropy.constants import M_sun, c
```
```{code-cell} python
E = M_sun * c ** 2
E.to(u.J)
```

## Coordinates

The Astropy coordinates submodule {obj}`astropy.coordinates` provides classes to represent physical coordinates with all their associated metadata, and transform them between different coordinate systems.
Expand Down Expand Up @@ -115,7 +144,7 @@ There are few things to notice about the difference between these two `SkyCoord`

### Spectral Coordinates

{obj}`astropy.coordinates.SpectralCoord` is a `Quantity` like object which also holds information about the observer and target coordinates and relative velocities.
{obj}`astropy.coordinates.SpectralCoord` is a `Quantity`-like object which also holds information about the observer and target coordinates and relative velocities.

```{note}
Use of `SpectralCoord` with solar data is still experimental so not all features may work, or be accurate.
Expand All @@ -138,7 +167,7 @@ spc = SpectralCoord(586.3 * u.nm, target=hpc2, observer=get_earth(time=hpc2.obst
spc
```

We can show the full details of the spectral coord (working around a bug in astropy):
(If you're viewing this document in Jupyter notebook form you may have to work around a [bug in astropy](https://github.com/astropy/astropy/issues/14758) to display `spc` properly):
```{code-cell} python
print(repr(spc))
```
Expand Down Expand Up @@ -201,5 +230,5 @@ wcs.axis_correlation_matrix
```

This correlation matrix has the world dimensions as rows, and the pixel dimensions as columns.
As we have a 2D image here, with two pixel and two world axes where both are coupled together.
This means that to calculate either latitude or longitude you need both pixel coordinates.
Here we have a 2D image, with two pixel and two world axes where both are coupled together.
This means that to calculate either latitude or longitude you need both pixel coordinates.
128 changes: 98 additions & 30 deletions docs/tutorial/2_search_and_asdf_download.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,33 +14,42 @@ kernelspec:
(dkist:tutorial:search-and-download)=
# Searching for DKIST Datasets

In this tutorial you will search for DKIST datasets available at the DKIST Data Center.
In this session we will cover how to search for DKIST datasets available at the DKIST Data Center.
In DKIST data parlance, a "dataset" is the smallest unit of data that is searchable from the data centre, and represents a single observation from a single instrument at a single pass band.

Each dataset comprises a number of different files:
* An ASDF file containing all the metadata, and no data.
* A quality report PDF.
* An mp4 preview movie.
* A (large) number of FITS files, each containing a "calibrated exposure".

The ASDF, quality report and preview movie can all be downloaded without authenticating, the FITS files require the use of Globus, which is covered in {ref}`dkist:tutorial:downloading-data`.

All of these files apart from the FITS files containing the data can be downloaded irrespective of embargo status.

For each of these "datasets" the DKIST Data Center keeps a "dataset inventory record" which is a limited set of metadata about the dataset on which you can search, either through the web portal or the `dkist` Python package.
The ASDF, quality report and preview movie can all be downloaded without authenticating, the FITS files require the use of Globus.


## Using `Fido.search`

The search interface used for searching the dataset holding at the DKIST data center is {obj}`sunpy.net.Fido`.
With `Fido` you can search for datasets and download their corresponding ASDF files.
The search interface we are going to use is {obj}`sunpy.net.Fido`.
`Fido` supports many different sources of data, some built into `sunpy` like the VSO and some in plugins like `dkist` or `sunpy-soar`.
With `Fido` you can search for DKIST datasets and download their corresponding ASDF files.
To register the DKIST search with `Fido` we must also import `dkist.net`.

```{code-cell} python
import dkist.net
import astropy.units as u
from sunpy.net import Fido, attrs as a
import dkist.net
```

`Fido` searches are built up from "attrs", which we imported above as `a`.
These attrs are combined together with either logical and or logical or operations to make complex queries.
These attrs are combined together with either logical AND or logical OR operations to make complex queries.
Let's start simple and search for all the DKIST datasets which are not embargoed:

```{code-cell} python
---
tags: [output_scroll]
---
Fido.search(a.dkist.Embargoed(False))
```

Expand All @@ -49,41 +58,82 @@ Because we only specified one attr, and it was unique to the dkist client (it st
If we only want VBI datasets, that are unembargoed, between a specific time range we can use multiple attrs:

```{code-cell} python
Fido.search(a.Time("2023/10/16 18:45", "2023/10/16 18:48") & a.Instrument.vbi & a.dkist.Embargoed(False))
Fido.search(a.Time("2022-06-02 17:00:00", "2022-06-02 18:00:00") & a.Instrument.vbi & a.dkist.Embargoed(False))
```

Note how the `a.Time` and `a.Instrument` attrs are not prefixed with `dkist`.
These are general attrs which can be used to search multiple clients.
Note how the `a.Time` and `a.Instrument` attrs are not prefixed with `dkist` - these are general attrs which can be used to search multiple sources of data.

So far the returned results have had to match all the attrs provided, because we have used the `&` (logical and) operator to join them.
So far all returned results have had to match all the attrs provided, because we have used the `&` (logical AND) operator to join them.
If we want results that match either one of multiple options we can use the `|` operator.
Let's also restrict our search to a particular proposal, `pid_2_114`.

```{code-cell} python
res = Fido.search((a.Instrument.vbi | a.Instrument.visp) & a.dkist.Embargoed(False) & a.dkist.Proposal("pid_2_114"))
res
---
tags: [output_scroll]
---
Fido.search((a.Instrument.vbi | a.Instrument.visp) & a.dkist.Embargoed(False))
```

As you can see this has returned two separate tables, one for VBI and one for VISP, even though in fact the VBI table is empty.
As you can see this has returned two separate tables, one for VBI and one for VISP.

Because `Fido` can search other clients as well as the DKIST you can make a more complex query which will search for VISP data and context images from AIA at the same time:

```{code-cell} python
time = a.Time("2022-06-02 17:00:00", "2022-06-02 18:00:00")
aia = a.Instrument.aia & a.Wavelength(17.1 * u.nm) & a.Sample(30 * u.min)
visp = a.Instrument.visp & a.dkist.Embargoed(False)

Fido.search(time, aia | visp)
```

Here we have used a couple of different attrs.
`a.Sample` limits the results to one per time window given, and `a.Wavelength` searches for specific wavelengths of data.
Also, we passed our attrs as positional arguments to `Fido.search`.
This is a little bit of sugar to prevent having to specify a lot of brackets; all arguments have the and (`&`) operator applied to them.

## Working with Results Tables

In this case, since there is no VBI data, let's first look at just the VISP results, the second table.
A Fido search returns a {obj}`sunpy.net.fido_factory.UnifiedResponse` object, which contains all the search results from all the different clients and requests made to the servers.
```{code-cell} python
visp = res[1]
visp
res = Fido.search((a.Instrument.vbi | a.Instrument.visp) & a.dkist.Embargoed(False))
type(res)
```

We can do some sorting and filtering using this table.
For instance, if we are interested in choosing data with a particular $r_0$ value, we can show only that column plus a few to help us identify the data:
The `UnifiedResponse` object provides a couple of different ways to select the results you are interested in.
It's possible to select just the results returned by a specific client by name, in this case all the results are from the DKIST client so this changes nothing.
```{code-cell} python
visp["Dataset ID", "Start Time", "Average Fried Parameter", "Embargoed"]
---
tags: [output_scroll]
---
res["dkist"]
```

or sort based on the $r_0$ column, and pick the top 3 results, showing the same columns as before:
This object is similar to a list of tables, where each response can also be selected by the first index:
```{code-cell} python
visp.sort("Average Fried Parameter")
visp["Dataset ID", "Start Time", "Average Fried Parameter", "Embargoed"][:3]
---
tags: [output_scroll]
---
vbi = res[0]
vbi
```

Now we have selected a single set of results from the `UnifiedResponse` object, we can see that we have a `DKISTQueryResponseTable` object:
```{code-cell} python
type(vbi)
```
This is a subclass of {obj}`astropy.table.QTable`, which means we can do operations such as sorting and filtering with this table.

We can display only some columns:
```{code-cell} python
---
tags: [output_scroll]
---
vbi["Dataset ID", "Start Time", "Average Fried Parameter", "Embargoed"]
```

or sort based on a column, and pick the top 5 rows:
```{code-cell} python
vbi.sort("Average Fried Parameter")
vbi[:5]
```

Once we have selected the rows we are interested in we can move onto downloading the ASDF files.
Expand All @@ -97,17 +147,35 @@ To download the FITS files containing the data, see the [downloading data tutori

To download files with `Fido` we pass the search results to `Fido.fetch`.

Let's do so with one of our VISP results:
If we want to download the first VBI dataset we searched for earlier we can do so like this:
```{code-cell} python
Fido.fetch(visp[0])
Fido.fetch(vbi[0])
```

This will download the ASDF file to the sunpy default data directory `~/sunpy/data`, we can customise this with the `path=` keyword argument.
Note that you can also pass more than one result to be downloaded.

A simple example of both of these is:
A simple example of specifying the path is:

```{code-cell} python
---
tags: [skip-execution]
---
Fido.fetch(vbi[0], path="data/mypath")
```

This will download the ASDF file as `data/mypath/filename.asdf`.

With the nature of DKIST data being a large number of files - FITS + ASDF for a whole dataset - we probably want to keep each dataset in it's own folder.
`Fido` makes this easy by allowing you to provide a path template rather than a specific path.
To see the list of parameters we can use in these path templates we can run:
```{code-cell} python
vbi.path_format_keys()
```

So if we want to put each of our ASDF files in a directory named with the Dataset ID and Instrument we can do:

```{code-cell} python
Fido.fetch(visp[:3], path="~/dkist_data/{instrument}_{dataset_id}/")
Fido.fetch(vbi[:5], path="~/sunpy/data/{instrument}/{dataset_id}/")
```

This will put each of our ASDF files in a directory named with the corresponding Dataset ID and Instrument.
Storing each dataset in its own folder may seem unneccessary right now, since we are only downloading a single ASDF file for each one. However, this extra level of sorting will become useful later on when we start to download the FITS files.
Loading
Loading