Skip to content

Latest commit

 

History

History
44 lines (29 loc) · 8.78 KB

2021-07-29-sam-repo-and-uow.md

File metadata and controls

44 lines (29 loc) · 8.78 KB

Trying out some Architecture: Repository and the Unit of Work collaborators

2021-07-29 / exchange,server,python / Sam

Clean Architecture talked a little about how a framework is merely a development detail and should be deferred just like any other detail in your system. On first reading I found this quite confusing and unhelpful. I understood the sentiment, but as someone who has just spent the pandemic year working on a "Django application", I couldn't see how one could possibly engineer applications and leverage a framework without making a decision early in the process, and conforming to that framework before it was too late.

The reason I bought Architecture Patterns with Python ("Cosmic Python") was for its appendix showing how you might integrate your freestanding application into Django. This was particularly helpful as Django is the only tool I've used to build large service-like applications, so I could see the boundaries of the example application and Django's responsibilities in much clearer terms. Seeing is believing and here was proof that a framework really can live on the "outside" of your application. Still, this begged the question: what's the point in all these lovely frameworks if you're going to write a bunch of code to keep them at bay?

Architecture Patterns with Python also introduced me to the Repository and closely related Unit of Work patterns. I won't go into detail here (because every person and their dog seems to have their own personal interpretation of each pattern), but my brief interpretation at this time is:

  • A Repository offers an interface for an application to manipulate a collection of objects (eg. add, get) while hiding how and where the data is stored, effectively keeping your application ignorant of how data is persisted (eg. in memory, a database, a file)
  • A Unit of Work (UoW) offers a context in which objects that have changed are noted, and those changes can be persisted (or discarded) as part of a transaction in your application

I set about building a Repository and UoW to hold the Clients and Stocks in memory, just like in my first program, but instead of interacting with a Python data structure directly, the application would have to interact with the Repository. I based my first Repository and UoW on the mock testing repo from the Cosmic Python book, but with extra flair; rather than merely mocking a Repository and holding a temporary list, I defined a class which held a dictionary named _objects as a class attribute, such that any instantiation of the Repository would be able to interact with the _objects stored inside. As suggested by the book, I made the UoW a Python context manager. A context manager requires an __enter__ dunder method to setup some context (my UoW just returns itself) and an __exit__ dunder method to specify what happens when you leave the context (my UoW calls its rollback function to discard uncommited changes). I'd never written one of these before, but it seems perfect for this case of starting and ending a "session".

I felt a bit dirty about my Repository, as class attributes shared across all past, present and future instantiations of a class as a means of persisting data felt a bit weird -- it's easy to accidentally create an instance and shadow or overwrite the class variable. This wasn't helped by internet searches wherein I found of conflicting examples of writing a Repository and UoW. I became a little frustrated with trying to do "the right thing" first time, which caused some procrastination.

Persevering, I took the example from the Cosmic Python book much further. I gave the Repository an instance variable dictionary called _staged_objects to keep track of objects that needed to be committed. I felt like I was really in the swing of things now. I added an _object_versions class dict, and _staged_versions instance dict too. If you were to imagine a process to update a user's holdings, my Repository and UoW worked like so:

  • A change in the system such as a bought or sold stock triggers a call to the Exchange's update_user service function
  • The Exchange service update_user method "enters" a context (using Python's with statement), instantiating a UoW that has access to the Repository for handling the users. A variable uow is in scope for dealing with the unit of work and is the only way to access the user Repository
  • update_user uses the context of the UoW and queries the user repository with uow.users.get
  • The Repository's get checks for the user object in its _objects class dictionary, copies (copy.deepcopy) it to its _staged_objects instance dictionary (and also copies the _object_version[user_id] to _staged_version[user_id]) and returns the staged object
  • The update_user method makes a change to the domain object and calls uow.commit
  • The UoW passes through the request to commit to the Repository:
    • The _staged_version[user_id] is checked against _object_version[user_id] to ensure the _objects dictionary has not been updated for this user since get was called
    • The _staged_objects[user_id] overwrites the _objects[user_id] and _object_version[user_id] is incremented
  • update_user exits the with block, closing the UoW context (calling uow.rollback automatically, but there is nothing to rollback)

It took some refining but it did indeed work! When the uow is instantiated by a service, it creates a new GenericMemoryRepository and specifies a prefix to be added to all the keys (for the _objects dict and so on), meaning the the GenericMemoryRepository can be used by any model in our domain (Stocks and Users) without worrying about key clases. This Repository is overkill as we'll likely migrate to some other means to persist storage, but it was important for me to see how a Repostory and UoW would work, even if just to abstract a Python list out of main.py or the Exchange class to an interface. I struggled with the idea of not "assigning" some memory in main.py or the service layer.

While this worked, it felt like a lot of effort to manage a dictionary, and I could certainly see the appeal of using a framework that takes all this work off you instead. I decided the way to test whether this was a worthwhile endeavour was to immediately write a new Repository and UoW to access an sqlite database and see how badly the Exchange was impacted.

Half a day or so later, I'd made some changes:

  • adapters/stex_sqlite.py defines the SQLAlchemy boiler plate to set-up a database table and "map" it to the domain object (to allow the ORM to commit domain objects and return domain objects from queries without writing any code)
  • The stex_sqlite adapter also provides StexSqliteSessionFactory which is used to create SQLAlchemy sessions, it is also controls the instantiation of a Singleton _engine that represents the database and is required for session making. I'm not sure it belongs here at the moment...
  • In my io/persistence.py, a StockSqliteRepository and StockSqliteUoW implement the required functions to add/get and commit/rollback (respectively), using the SQLAlchemy session

What about for the service? I was pretty stunned to discover the pattern really does work! All I needed to change was the StockMemoryUoW to StockSqliteUoW!

Suddenly a few things clicked for me. We can take advantage of the benefits of incredible software like SQLAlchemy, but we don't have to let it in to the inner-circle of our application. In this case, the Repository (and UoW) allows us to take advantage of the bits of SQLAlchemy that we want, without letting our application in on the secret. Indeed, as far as the Exchange was concerned, the objects could still be in a class attribute in our GenericMemoryRepository. Incredibly, I could pick one or the other at start-up and the application just worked either way.

This might seem like a long post to say "hey design patterns are pretty good you know", but it's a little more than that to me. Throughout my programming career so far, I'd always thought "less code is clean code". I don't see interfaces so much in Python code (compared to say, Java), and abstractions about storage and the like are quite "enterprise-y" and rarely seen in scientific programming paradigms. All this boilerplate to hold SQLAlchemy (or indeed, a simple Python dict) at arms length is completely contrary to how I've worked in the past, and yet, the STEX2 server could swap to another ORM or persistence layer tomorrow with nothing more than a bit of grunt work to set up an appropriate Repository, and the application is none the wiser. It seems that more code, not less code, is the recipe to an adaptable system.