Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of DEP for static-ish typechecking for Django #65

Closed
wants to merge 5 commits into from

Conversation

mkurnikov
Copy link

@mkurnikov mkurnikov commented Oct 6, 2019

@carltongibson
Here's what I know about the possible solutions and limitations of implementing typechecking for Django codebase. Based on experience from https://github.com/typeddjango/django-stubs

Let me know on how I can improve and expand it.

@mkurnikov mkurnikov changed the title Very very first draft of DEP for static typechecking of Django First draft of DEP for static-ish typechecking of Django Oct 6, 2019
@mkurnikov mkurnikov changed the title First draft of DEP for static-ish typechecking of Django First draft of DEP for static-ish typechecking for Django Oct 6, 2019
Copy link
Member

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mkurnikov. Thank you for this. I will ask on the mailing list now for folks to join in.

I have a few initial comments:

  • Can you re-phrase the two ways we could do it section to, We're going to do it this way? i.e. We could have stubs but we want to inline it. (That is what we want to do yes?) Bottom line here is that the DEP, when agreed, should be normative. Do this!.

  • From use of language where type annotations are more established, it seems to me that a lot of the problems (horribleness/clutter) are alleviated by aggressive use of type aliases.

    For example any time we want to have a user in Django it's probably something like Union[AbstractBaseUser, AnonymousUser], which I don't want to write or see (being honest). But if we define a type alias, I can just put User, which is fine, good even.

    • It's keeps the annotations cleaner
    • and, we can refine the type aliases to be more precise over time, without changing the annotated code.

    There are plenty of cases where it's much worse than Union[AbstractBaseUser, AnonymousUser], I think. Can we give some examples to show how we could encapsulate and manage this complexity of argument type?

  • DISCUSSION POINT: The @overload section makes me shudder. (Perhaps I just need to get used to it but...) Can we (should we?) explicitly add that we're not going to add explicit typing in SOME cases — i.e. those where it's too invasive? (Define this.) (For me, the tooling needs to catch up here, rather than us enforce too much on Python/our code...)

  • There's a template for DEPs. Can you use that. I'm happy to be shephard if you like.

Personal view: to the extent that it's not horrible™ (see above) I think hints are beneficial and we should push on, so broadly +1 (modulo WIP/Draft status).

Either way though, thanks for the effort to push this forwards!

@django django temporarily blocked zaidipoetra754 Oct 8, 2019
@django django deleted a comment from zaidipoetra754 Oct 8, 2019
@django django deleted a comment from zaidipoetra754 Oct 8, 2019
@rik
Copy link

rik commented Oct 8, 2019

  • From use of language where type annotations are more established, it seems to me that a lot of the problems (horribleness/clutter) are alleviated by aggressive use of type aliases.
    For example any time we want to have a user in Django it's probably something like Union[AbstractBaseUser, AnonymousUser], which I don't want to write or see (being honest). But if we define a type alias, I can just put User, which is fine, good even.

    • It's keeps the annotations cleaner
    • and, we can refine the type aliases to be more precise over time, without changing the annotated code.

    There are plenty of cases where it's much worse than Union[AbstractBaseUser, AnonymousUser], I think. Can we give some examples to show how we could encapsulate and manage this complexity of argument type?

For types that don't currently have a type hierarchy, you can give them a type hierarchy through Abstract Base Classes or use Protocols(PEP 544, mypy docs).

@MarkusH
Copy link
Member

MarkusH commented Oct 8, 2019

Thank you for writing this down, @mkurnikov.

In the past I've objected to the use of type hints generally as I thought they clutter the code base with needless information in a dynamic scripting language like Python.

Now that I've used type hints at work I've seen some benefit in having them.

But the benefits stop and instead become annoying and confusing when type hints become more complex. Indicating a parameter as int or str is useful to me, or even unions of such, but when when it comes to the complex type definitions and overloads as you've drawn out here, my eyes start to hurt. To me it's code bloat that's hard to read and maintain without adding much value (Django worked without them for 14 years).

Which brings me to the next topic: I think that any form of type hints which is not part of the code base (i.e. in .pyi files) will becomes stale sooner rather than later and will increase the complexity of contributions. If I, as a maintainer or a reviewer, need to check that each .py file touched has the corresponding .pyi file updated, I'd probably throw in the towel rather soon and not review the PR.

My suggestion would be to start to add hints to functions and ignore the complexity of models etc. I.e. a lot of stuff in django.utils could probably receive type hints. But stuff in django.db probably wouldn't.

@charettes
Copy link
Member

I tend to agree with @MarkusH here.

I think a plan to start with adding basic support for type hints to clearly defined interfaces and see how much of a maintenance burden it is. By setting up mypy on CI we could then gradually target more modules if need be.

@adamchainz
Copy link
Member

I think a gradual approach makes sense too. It's hard to properly test ongoing maintenance burden without this being in Django itself, but I wouldn't want to enforce it across the whole codebase without more knowledge.

@felixxm
Copy link
Member

felixxm commented Oct 8, 2019

@mkurnikov
Copy link
Author

mkurnikov commented Oct 8, 2019

@carltongibson

For example any time we want to have a user in Django it's probably something like Union[AbstractBaseUser, AnonymousUser], which I don't want to write or see (being honest). But if we define a type alias, I can just put User, which is fine, good even.

It's keeps the annotations cleaner
and, we can refine the type aliases to be more precise over time, without changing the annotated code.
There are plenty of cases where it's much worse than Union[AbstractBaseUser, AnonymousUser], I think. Can we give some examples to show how we could encapsulate and manage this complexity of argument type?

There's a problem where we need differentiate between django.contrib.auth.models.User and User as a type alias. We can use a specific notation (I try to use underscored names for type aliases in django-stubs, like _User). Other approaches, like UserType are too misleading in my opinion.

Or they could be internal to a Django codebase. (Then underscore notation is even more appropriate then).

The @overload section makes me shudder. (Perhaps I just need to get used to it but...) Can we (should we?) explicitly add that we're not going to add explicit typing in SOME cases — i.e. those where it's too invasive? (Define this.) (For me, the tooling needs to catch up here, rather than us enforce too much on Python/our code...)

For Field specificaly, those are valid overloads, there's no way to express the behavious of if model class => return Field, else return underlying type. It could always be managed inside a plugin though, but it dimishes the value of annotated codebase, as no typechecker besides mypy would be able to use it.

How do I update the DEP based on discussion? Should it be in UPD section, on just update text and keep history in the git commits?

@MarkusH

Which brings me to the next topic: I think that any form of type hints which is not part of the code base (i.e. in .pyi files) will becomes stale sooner rather than later and will increase the complexity of contributions. If I, as a maintainer or a reviewer, need to check that each .py file touched has the corresponding .pyi file updated, I'd probably throw in the towel rather soon and not review the PR.

There are some utilities to merge .pyi stubs into the codebase to typecheck it and make them in sync
https://github.com/ambv/retype
https://github.com/google/pytype/tree/master/pytype/tools/merge_pyi

I tried retype way in the past, it didn't work well. I tried merge-pyi recently, and except for some quirks like not recognizing @overload clauses, it does the job well. I'm sure it can be improved to the point of smooth merging/typechecking.
So, for .pyi to be synced, we could just add a tool like this in the CI, and keep the codebase annotations-free.

@carltongibson
Copy link
Member

@mkurnikov So, do you prefer stub files? (I'd always thought that was just a work around...?) Perhaps I've misread you there?

From early comments, a phased introduction, keeping it nice and readable, to try out would be broadly accepted I think.

Some (pseudo-)examples on handling more complex cases would be good — I think that's where we're more unsure.

How do I update the DEP based on discussion? Should it be in UPD section, on just update text and keep history in the git commits?

Edit to make it correct. The commit history is still there, at least whilst it's in discussion.

@felipe-lee
Copy link

I personally think adding type hinting would be a great idea. I do agree with starting with simpler parts, but I think we should eventually type hint even the more complex bits.

There's a problem where we need differentiate between django.contrib.auth.models.User and User as a type alias. We can use a specific notation (I try to use underscored names for type aliases in django-stubs, like _User). Other approaches, like UserType are too misleading in my opinion.

Or they could be internal to a Django codebase. (Then underscore notation is even more appropriate then).

I think some type aliases might be appropriate to be just internal, but there are others that would be beneficial to have available for anyone using django. For example, this User type (whatever it is called in the end) would probably be helpful to just be able to import the alias from django instead of having to recreate it within projects.

There are some utilities to merge .pyi stubs into the codebase to typecheck it and make them in sync
https://github.com/ambv/retype
https://github.com/google/pytype/tree/master/pytype/tools/merge_pyi

I tried retype way in the past, it didn't work well. I tried merge-pyi recently, and except for some quirks like not recognizing @overload clauses, it does the job well. I'm sure it can be improved to the point of smooth merging/typechecking.
So, for .pyi to be synced, we could just add a tool like this in the CI, and keep the codebase annotations-free.

I don't have any experience with .pyi files, but based on reading the READMEs for those two repos it seems like you can run the tools to apply type hints from .pyi files to the source directly, or output the combined source/type files into another directory. The first makes it seem like the annotations would also live in the source code and the latter seems like we would need to change the workflow of what gets put out. I could be completely misunderstanding this though. I think it'd be helpful to outline the workflow for using stubs, mainly because I doubt I'm the only one without any stubs experience.

@mkurnikov
Copy link
Author

mkurnikov commented Oct 8, 2019

@carltongibson I've updated the DEP with the concrete migration path, and aligned it with the DEP template.

@mkurnikov
Copy link
Author

There's also a question about mypy plugin packaged with django-stubs. Without it, most of the QuerySet and Model annotations will just be Any, which decreases the value of typechecking for user codebases.
It works more or less ok now, but there's a big "but" here: it uses runtime information to gather Model metadata via _meta API. So, for users to be able to typecheck their codebase, they would need to actually run django.setup() step inside mypy.

  1. It breaks dmypy, mypy daemon for typechecking, which makes typechecking much faster.
    Mypy folks suggested to add some features to allow dmypy to work with runtime reruns of django.setup(), but haven't yet finished them.

  2. If user's django.setup() has side-effects, it could lead to some problems.

What do you think about it?

@sobolevn
Copy link

sobolevn commented Oct 8, 2019

Hi, @mkurnikov! Glad that this DEP is created!

I would also like to raise an important question about other plugins integration into django-stubs. As we have seen some problems with django-configurations and django-split-settings. I am pretty sure that there will be other examples.

And huge frameworks like django-rest-framework also do some tricky parts.

I had some experience in working with both .py and .pyi files in the same project. It significantly increases a maintenance cost. Because, with this structure we have to edit two files at once. So, my vote goes to applying stubs into the codebase.


## Abstract

Add mypy (and other type checker) support for Django.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about other type checkers. They are not tested, plugins are mypy specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless someone has very specific input here mypy can be the assumed path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it clear by mentioning this explicitly in the DEP. But I'd like to see a list of alternatives and a one-sentence explanation on why we are not using one of them.

It's desirable that this package would be usable alongside the Django type annotations migration.


### Incremental migration path:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include our internal milestones here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was about how does the development of django-stubs itself affects the integration process.

@Dmitri-Sintsov
Copy link

In my opinion static-like typechecking is unpytonic because Python is a duck-typing language. It would bloat the code base and would probably make mocking harder. It's better to have high coverage of tests and do not use non-pytonic approach at all.

@mkurnikov
Copy link
Author

@Dmitri-Sintsov

It would bloat the code base and would probably make mocking harder.

Bloat was already discussed here, it's a valid concern. But how does it make mocking harder? Type hints are mostly compile-time concept, they have minimal influence on the runtime.

Copy link
Member

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mkurnikov. Thanks for the updates.

Next comment... 🙂

It seems to me we're missing an important Motivation section. From the template:

This section should explain why this DEP is needed. The motivation is critical for DEPs that want to add substantial new features or materially refactor existing ones. ... DEP submissions without sufficient motivation may be rejected outright.

tl;dr: What does type hinting give us?

We need this because without it, folks will look at something like:

def deconstruct(self) -> Tuple[bool, str, None, Tuple, Dict[str, int]]:

or

def get_ordering(self, request: WSGIRequest, queryset: QuerySet) -> List[Union[OrderBy, Combinable, str]]:

And they'll just think "madness" — code will became harder to maintain, harder to contribute, less readable etc. Or so it goes... — I personally quite like a sprinkling of annotations, but I'm sympathetic to this concern too.

So, at this stage, a good Motivation section should at least lay out the advantages, so there's something on the table to weigh doubts against.

Make sense?

(Thanks for your efforts on this!)

@mkurnikov
Copy link
Author

@carltongibson First annotation is incorrect and just an artifact of using MonkeyType to generate initial stubs=) deconstruct() methods should just be annotated with Any as a return type.

Yes, I got the idea, I'll update it in a day or two.

@carltongibson
Copy link
Member

carltongibson commented Oct 9, 2019

Hi @mkurnikov — It's the idea that matters at this point. Thanks 👍

Update: it might be worth adding something like, "Any will be used where ..." explaining this kind of case.

In the plugin `django-stubs` dynamically marks `name` and `surname` as `CharField[Optional[Union[str, int, Combinable]], Optional[str]]`. We cannot use (as far as I know),

```python
class CharField(Field[Union[str, int, Combinable], str]):
Copy link

@syastrov syastrov Oct 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have a solution to this particular problem, which is only possible since the latest mypy version, 0.730 released a few weeks ago (due to python/mypy#7188).
If we overload the __new__ method and use Literal type to match when null is True or False, then we can alter which Generic parameters are passed into Field, when it is constructed.

I will give a complete working example:

file a.pyi:

from typing import (
    Generic,
    Optional,
    TypeVar,
    Union,
    overload,
)

from typing_extensions import Literal

class Combinable: ...

class Model: ...


_T = TypeVar("_T", bound="Field")
_ST = TypeVar("_ST")
_GT = TypeVar("_GT")

class Field(Generic[_ST, _GT]):
    # Here I tried to do the following to avoid duplicating this in each subclass, 
    # but I can't find a way to refer to the correct subclass (rather than Field[_ST, _GT] 
    # it should return _T[_ST, _GT], where _T is the subclass, e.g. CharField) :
    # @overload
    # def __new__(cls, null: Literal[False] = False, *args, **kwargs) -> Field[_ST, _GT]: ...
    # @overload
    # def __new__(cls, null: Literal[True], *args, **kwargs) -> Field[Optional[_ST], Optional[_GT]]: ...

    @overload
    def __get__(self: _T, instance: None, owner) -> _T: ...
    # Model instance access
    @overload
    def __get__(self, instance: Model, owner) -> _GT: ...
    # non-Model instances
    @overload
    def __get__(self: _T, instance, owner) -> _T: ...

CharField_ST = Union[str, int, Combinable] 
CharField_GT = str
class CharField(Field[_ST, _GT]):
    @overload
    def __new__(cls, null: Literal[False] = False, *args, **kwargs) -> CharField[CharField_ST, CharField_GT]: ...
    @overload
    def __new__(cls, null: Literal[True], *args, **kwargs) -> CharField[Optional[CharField_ST], Optional[CharField_GT]]: ...

file b.py:

from a import CharField, Model

class MyModel(Model):
    f1 = CharField()
    f2 = CharField(null=True)
    f3 = CharField(null=False)

reveal_type(CharField())
reveal_type(CharField(null=True))
reveal_type(CharField(null=False))
reveal_type(MyModel().f1)
reveal_type(MyModel().f2)
reveal_type(MyModel().f3)

Then, you can see that mypy type-checks applies the correct types!

$ mypy b.py
b.py:8: note: Revealed type is 'a.CharField[Union[builtins.str, builtins.int, a.Combinable], builtins.str]'
b.py:9: note: Revealed type is 'a.CharField[Union[builtins.str, builtins.int, a.Combinable, None], Union[builtins.str, None]]'
b.py:10: note: Revealed type is 'a.CharField[Union[builtins.str, builtins.int, a.Combinable], builtins.str]'
b.py:11: note: Revealed type is 'builtins.str*'
b.py:12: note: Revealed type is 'Union[builtins.str, None]'
b.py:13: note: Revealed type is 'builtins.str*'


### How django-stubs currently implemented

`django-stubs` uses a mix of static analysis provided by mypy, and runtime type inference from Django own introspection facilities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to distinguish between which type-checking features are provided by the stubs and which are provided from the plugin.
Maybe make it clear that the part that uses runtime type inference is the mypy plugin currently located in the django-stubs project, and that most of this section refers to features that the mypy plugin provides beyond what is provided in the stubs.

@davidfstr
Copy link

FWIW, I am +1 on adding __class_getitem__ to QuerySet. It appears to be low risk and unlocks a lot of potential gain.

(For those who don't know me, I'm generally a fan of using typechecking in the context of Django and have been doing so in production for the primary mission-critical app at my own company for the last 1-2 years. I am also an infrequent Django contributor.)

((I would like to give more feedback on the overall typechecking DEP here - which has a lot of design considerations to think through - but simply do not have the bandwidth at the moment.))

@adamchainz
Copy link
Member

+1 on the __class_getitem__, and I agree with Carlton on deferring anything too ambitious.

@sobolevn
Copy link

sobolevn commented Feb 1, 2020

Ok then! Do I need to create a ticket first? As it was said in django/django#11849

Or sending a PR without a ticket is good enough?

@sobolevn
Copy link

sobolevn commented Feb 2, 2020

I have submitted a new ticket: https://code.djangoproject.com/ticket/31223#ticket
PR is on its way 🎉

Update: PR is done django/django#12405

@kylebebak
Copy link

kylebebak commented Feb 6, 2020

Long-time Django, mypy, and TypeScript user here.

I use all three in open source projects and at work. FWIW, type hints make me much more productive, both as a maintainer and consumer of libraries.

I think having type hints in the core Django library would be a huge win for developers using Django. It might even prove to be a win for Django core devs.

There's an up-front cost to adding type hints, but IMO the investment pays off quickly. Code, especially in a big library like Django, is read much more than it's written. Type hints make reading and consuming code much easier.

They also make refactoring a big code base a lot less scary. When you make a change, you have a type checker that runs in 10 seconds telling you exactly where things are broken. Without type hints, assuming you're on a good team that writes lots of tests, you wait 10 minutes while the tests run and then you have a rough idea of what's broken.

It's no accident that most of the new Python frameworks (Starlette, FastAPI, etc) are built with type hints.

@dmwyatt
Copy link

dmwyatt commented Apr 11, 2020

It looks like the mypy issues preventing BaseManager.from_queryset(), QuerySet.as_manager() were fixed/closed.

@antonagestam
Copy link

What's the status of this DEP? Maybe it would benefit to move to a decision from the technical board? @carltongibson

@carltongibson
Copy link
Member

I will be posting the response from the Technical Board later today.

@carltongibson
Copy link
Member

Statement from the Technical Board is here: https://groups.google.com/d/topic/django-developers/C_Phs05kL1Q/discussion

@dmwyatt
Copy link

dmwyatt commented Apr 15, 2020

Unfortunate but understandable.

Only question I have is what do they mean by "competing technologies"? My assumption is that MyPy-based typing is like 95% of all the typing that is being done.

@davidfstr
Copy link

Beyond MyPy there are at least 2 other standalone typecheckers using PEP 484 (Type Hints). Pyre is one of them. I forget the other. Pyre has the advantage of speed on very large codebases I believe.

My understanding is that PyCharm also has a typechecking system that aims to be compliant with PEP 484.

@dmwyatt
Copy link

dmwyatt commented Apr 15, 2020

The only thing I can think that they might when they say "competing technologies" is "what if we integrate all these types into Django and then everyone else ends up using some other type checker?"

I believe the type checkers you mention all implement 484, so that obviates this concern much like worrying about which markup language to use is obviated by using HTML because it is supported by many renderers.

@ljodal
Copy link

ljodal commented Apr 16, 2020

Due to the dynamic nature of Django simple type annotations will not work in a lot of cases (models, querysets, forms, etc). While the django-stubs project does provide basic type annotations for many of Django’s functions much of the magic there is in a Mypy specific plugin. My guess is that’s what they are referring to as “competing technologies”.

Even parts of the standard library depend on plugins for proper type checking (eg dataclasses). I don’t know about pyre or pycharm, but in both Mypy and Google’s pytype this is implemented internally as plugins, which are completely different and only Mypy supports custom plugins at the moment.

@MarkusH
Copy link
Member

MarkusH commented Aug 2, 2021 via email

@nickmccullum
Copy link

Please implement this!

@WisdomPill
Copy link

@carltongibson two years has passed after the technical board statement, has anything changed? was the wait enough to make the decision change?

@carltongibson
Copy link
Member

@WisdomPill Hi. As far as I am aware the situation hasn't changed significantly.

There were recent(ish) discussions on Django Chat with Nikita and Adam touching on this. It might be worth giving those a listen. (Nikita was sceptical we'd do it; Adam pointing to some possible ways forward, small steps at least.)

But I don't imagine a sudden consensus to implement inline type annotations if it were re-raised, no.

@thanksyouall
Copy link

We really want to see an annotation of the types.

@senseysensor
Copy link

this would be great feature. Please consider to merge the projects.

@Macktireh
Copy link

Adding type annotations to Django would be a great enhancement. It would greatly improve code clarity, maintainability, and enable better static analysis. Please consider implementing this feature. It would be highly appreciated by the community.

@jp-larose
Copy link

Incorporating type hints in Python projects is slowly becoming the norm, or at the very least, a best practice. Looking at django-stubs, most of the work has been done. It would make more sense to have the stubs be part of the official Django project, so that the type hints are updated right along the implementation.

@ysmolski
Copy link

ysmolski commented Jun 6, 2023

I vote for this effort to be integrared into the main project.

@django django locked as resolved and limited conversation to collaborators Jun 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.