Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design overview #730

Merged
merged 12 commits into from
Feb 11, 2020
Merged

Add design overview #730

merged 12 commits into from
Feb 11, 2020

Conversation

perrygreenfield
Copy link
Contributor

This adds an initial design review document aimed at those trying to understand the internal workings of ASDF. The focus on this initial version is what happens when an ASDF file is read.

This addresses [#729]

CHANGES.rst Outdated
----------------

- Add a developer overview document to help understand how ASDF works
internally. Still a work in progress. [#729]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR number is 730.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I put the the issue number (a bit of chicken and egg thing since one can't be sure of the PR number until it is submitted, can one?)

**Resolver:** Tools to map URIs and tags into actual locations of schema files,
**which may be local directories (the usual approach) or an actual URL for
**retrieval over the network. This is more complicated that it may seem for
**reasons explained later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the ** from lines 38-40.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These appeared when wrapping the text (editor was too clever). I thought I had caught all of these, but obviously not.

@nden nden added the Docs label Jan 2, 2020
@nden nden added this to the 2.6 milestone Jan 2, 2020
@nden nden requested review from eslavich, jdavies-st and mcara January 2, 2020 21:11
@codecov
Copy link

codecov bot commented Jan 2, 2020

Codecov Report

Merging #730 into master will not change coverage.
The diff coverage is 99.29%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #730   +/-   ##
=======================================
  Coverage   93.94%   93.94%           
=======================================
  Files          43       43           
  Lines        4889     4889           
=======================================
  Hits         4593     4593           
  Misses        296      296
Impacted Files Coverage Δ
asdf/generic_io.py 94.36% <100%> (ø) ⬆️
asdf/commands/info.py 100% <100%> (ø) ⬆️
asdf/types.py 96.46% <100%> (ø) ⬆️
asdf/extension.py 91.12% <100%> (ø) ⬆️
asdf/tags/core/__init__.py 100% <100%> (ø) ⬆️
asdf/schema.py 93.48% <100%> (ø) ⬆️
asdf/commands/__init__.py 100% <100%> (ø) ⬆️
asdf/exceptions.py 100% <100%> (ø) ⬆️
asdf/_convenience.py 100% <100%> (ø) ⬆️
asdf/asdf.py 94.11% <100%> (ø) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 349e20f...8e8444a. Read the comment docs.

The design is somewhat complicated by the fact that in using pyyaml and
jsonschema, one uses them by supplying code for these libraries to use, and
without becoming familiar with how these packages work, the operation of the
asdf internals isn't always very clear. This will try to provide a small amount
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
asdf internals isn't always very clear. This will try to provide a small amount
asdf internals isn't always very clear. This document will try to provide a small amount

expected to grow organically so at the moment it should not be considered
complete or comprehensive.

The design is somewhat complicated by the fact that in using pyyaml and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already scary :) I just started reading this document and already discovered that I was supposed to know pyyaml and jsonschema and that these packages were used in asdf. I think reformulating this might help.

Also in using ..., one uses them by supplying code for these libraries to use This is very confusing formulation for me and verb "use" used three times in a single sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcara Not sure what you mean by the first (regarding reformulating). As to the second, I blame a trancelike mode resulting in stream of gibberish text.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is somewhat complicated by the fact that in using pyyaml..." sounds to me like I am suddenly presented with this new package pyyaml. Maybe the sentence should be changed to (if one assumes that pyyaml is a prerequisite) something that introduces pyyaml in a more smooth way. Also, if it is a prerequisite, then state so at the beginning. If not - introduce it first.

Another thing that is not clear to me: "The design is somewhat..." Design of what? Of schemas, tags, or this document?

Copy link
Contributor Author

@perrygreenfield perrygreenfield Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to write this more clearly in the version I just pushed.

docs/index.rst Outdated
Currently a work in progress. Intended to give an overview of how the various
parts of ASDF interact and which modules do what and how.

.. toctree:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a double colon, hopefully this will fix the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was seeing single when I should have been seeing double!

@mcara
Copy link

mcara commented Jan 2, 2020

What are the prerequisites assumed that reader knows before reading this doc? Maybe there should be a paragraph saying this from the beginning. Also, is it assumed that users know what schema and tags are?

@perrygreenfield
Copy link
Contributor Author

@mcara That's a good question, and one I'm not terribly clear on myself. Being too specific about this may unduly scare people off. I figure that if they are familiar with using these features that they probably have dealt with schemas and tags already, and if they haven't they probably should play with them. So perhaps a paragraph that says that it assumes they have some familiarity with the use of both? But overall, it was targeted at my level of ignorance (i.e., moderately high)

the pyyaml and jsonschema libraries use. Understanding what is going on
thus means having some understanding of the relevant parts of the
internals of both of those libraries. This overview will try to provide
a small amount of context for these package to illuminate how the code
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a small amount of context for these package to illuminate how the code
a small amount of context for these packages to illuminate how the code

@embray
Copy link
Contributor

embray commented Jan 3, 2020

Perhaps a somewhat related issue, but a lot of the YAML-specific hacks in ASDF are not entirely ASDF-specific and this might be worth emphasizing where true.

In particular, I've had a handful of unrelated discussions in recent months where I've told people about our YAML Schema which might have been of interest to them. But the only implementation I know of YAML Schema is the one buried in pyasdf, even though it could (and I think should) easily stand on its own.

@perrygreenfield
Copy link
Contributor Author

@embray That's a good point. How much interest do you think there is in such a package? I haven't given any thought to how much work would be involved in splitting that out. I guess mostly in the validation.py and yamlutil.py areas.

retrieval over the network. This is more complicated that it may seem for
reasons explained later.

**Validation:** Tool to confirm that the YAML conforms to the schemas that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be called Validator (a tool) or alternatively Validation: The process ...

parser converts the raw YAML into a custom Python structure. It is that
structure that is validated. Then if no errors are found, the tree is
converted into a tree where tagged nodes get converted into special Python
objects (usually), e.g., WCS object or numpy arrays (well, not quite that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why usually? When is it not converted? If you mean that it is not converted to Python objects when it fails validation, then usually should be removed as you said that happens when there are no errors.

Instead of get converted into special Python objects is it clearer to say something like get converted into the corresponding Python objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nden Does the latest version address this? The context is when an option is supplied to prevent the conversion (_force_raw_types).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. I was not aware of that flag. And it's not documented.


Most of resolver tools and code is in ``resolver.py`` (but not all)

Most of the validation code is in ``schema.py``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add period at the end of the above two sentences.

The yaml parsing phase described below normally returns a "tagged_tree". That is
(somewhat simplified), it returns the data structure that yaml would normally
return without any object conversion (i.e., all nodes are either dicts, lists,
or scalar values), except that they are objects that now sport a tag attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sport --> support

up the af object intrinsically). The function then passes the callback walker to
treeutil.walk_and_modify() where the tree will be traversed recursively applying
the tag code associated with the tag to the more primative tree representation
replacing such nodes wtih Python objects. The tree travsersal starts from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wtih --> with

it doesn't appear to vary except for that. Admittedly, this is only created at
the top level. This is called by ``get_validator``.

**class OrderedLoader:** Hnherits from the ``_yaml_base_loader``, but otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hnherits --> Inherits

@embray
Copy link
Contributor

embray commented Jan 7, 2020

@perrygreenfield

I guess mostly in the validation.py and yamlutil.py areas.

Yep (I'm not sure what validation.py is; maybe you meant schema.py?). Looking at them again, there's really very little about it that's specific to ASDF. Some of the naming (e.g. AsdfLoader) makes it seem that way, but very little of it is particular to ASDF so much as to YAML Schema itself, including things like the "tagged" object wrappers. This stuff I think is broadly applicable and useful and I have wanted something like this for my own use.

It's hard to judge exactly how much demand there is for this more broadly-speaking, beyond some discussions I've seen on Twitter. But from what I can tell, although it has evolved very little there is still a lot of interest in and demand for YAML, and better tools for manipulating and validating YAML documents, and I think there's a lot of richness in what we've done here.

@perrygreenfield
Copy link
Contributor Author

Yeah, schema.py is what I meant. In any case, it can't hurt to make it available (except for work on our end, of course).

@eslavich
Copy link
Contributor

@perrygreenfield is this ready to merge?

@perrygreenfield
Copy link
Contributor Author

I think so, aside from the CHANGES.rst item, but needs reviewer approval doesn't it?

@eslavich
Copy link
Contributor

Oh, ha, it's waiting on me isn't it. I'll take a look now.

Copy link
Contributor

@eslavich eslavich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the write-up, I found it helpful! I pointed out some typos but other than that it looks good. I'm planning to add a section on versioning once this is merged.

docs/asdf/developer_overview.rst Outdated Show resolved Hide resolved
docs/asdf/developer_overview.rst Outdated Show resolved Hide resolved
docs/asdf/developer_overview.rst Outdated Show resolved Hide resolved
top, but the objects are created from the bottom up due to recursion.

The result is what af.tree is set to, after doing another tree travseral looking
for special type hooks for each node. It isn't cleaf if there is yet any use of that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaf --> clear

docs/asdf/developer_overview.rst Outdated Show resolved Hide resolved
(This is the major reason that the tree isn't directly converted to an object
tree since jsonschema would not be able to use the final object tree for
validation, besides issues relate to the fact that things that don't validate
may not be converable to the designated object.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converable --> convertable

docs/asdf/developer_overview.rst Outdated Show resolved Hide resolved
docs/asdf/developer_overview.rst Outdated Show resolved Hide resolved
@perrygreenfield perrygreenfield merged commit 29033ac into asdf-format:master Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants