Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request help with implementing file upload field #240

Open
mjmare opened this issue Feb 19, 2017 · 9 comments
Open

Request help with implementing file upload field #240

mjmare opened this issue Feb 19, 2017 · 9 comments

Comments

@mjmare
Copy link

mjmare commented Feb 19, 2017

Forgive me for creating this issue, but I couldn't find a mailinglist or forum to discuss things like this.

Since Neo4j is not good at storing file blobs I thought it would be useful to implement a FileUploadProperty backed by the nice package FileDepot (https://pypi.python.org/pypi/filedepot). FD lets you store files in various backends and only a returned file_id is stored in your database.
What I would like to create is a Neomodel property that would accept a file object (or FD FileIntent). The property would store the file object in FD and store the file_id in Neo4j. When accessing the property it would retrieve the file from FD and return a FD FileStorage (a sort of file descriptor).

I thought it would be easy to implement this with a NeoModel property by overriding the inflate/deflate methods. It turns out this did not work since inflate/deflate only handles the (de)serialisation to the graph node. It seems I needed to intercept the getting and setting the property.

This code works:

import io
import uuid
from datetime import datetime

from depot.io.utils import FileIntent
from depot.manager import DepotManager
from neomodel import DateTimeProperty, IntegerProperty, Property, StringProperty, StructuredNode, UniqueIdProperty, \
    config, db, install_all_labels
from neomodel.properties import validator

DepotManager.configure('default', {'depot.storage_path': './files'})


class UploadedFileProperty(Property):
    """
    Attempt at implementing Neomodel property that stores an uploaded file, backed by FileDepot. In fact by using
    FileDepot only a file_id is stored in the Neo4j database.
    """
    def __init__(self, **kwargs):
        for item in ['required', 'unique_index', 'index', 'default']:
            if item in kwargs:
                raise ValueError('{} argument ignored by {}'.format(item, self.__class__.__name__))

        # kwargs['unique_index'] = True
        super(UploadedFileProperty, self).__init__(**kwargs)

    @validator
    def inflate(self, value):
        uuid.UUID(value)
        return value

    @validator
    def deflate(self, value):
        uuid.UUID(value)
        return value


config.AUTO_INSTALL_LABELS = False
config.ENCRYPTED_CONNECTION = False
db.set_connection('bolt://neo:neo@localhost:8687')


class Person(StructuredNode):
    uuid = UniqueIdProperty()
    name = StringProperty(unique_index=True)
    age = IntegerProperty(index=True, default=0)
    born = DateTimeProperty()
    photo_ = UploadedFileProperty(db_property='photo')

    @property
    def photo(self):
        depot = DepotManager.get()
        return depot.get(self.photo_)

    @photo.setter
    def photo(self, value):
        depot = DepotManager.get()

        # delete if None
        if (value is None) and self.photo_:
            depot.delete(self.photo_)
            self.photo_ = None

        # set the value
        if isinstance(value, str):
            try:
                uuid.UUID(value)
                self.photo_ = uuid.UUID(value)
            except ValueError:
                raise TypeError(f'Value "{value}" is not a uuid')
        elif isinstance(value, (FileIntent, io.IOBase)):
            self.photo_ = depot.create(value)
        else:
            TypeError(f'Value "{value}" should be of type file or FileIntent')


install_all_labels()

# p=Person(name='justme2', age=20, born=datetime.now(), photo=open('/Users/mjm/Desktop/iu.jpeg', 'rb')).save()

p = Person.nodes.get(name='justme2')
print(p)

p.age = 13
p.photo = FileIntent(open('/Users/mjm/Desktop/iu.jpeg', 'rb'), 'ui.jpg', 'image/jpeg')
# p.photo = 'test'
p.save()

print(p)

But handling the setting/getting in the StructuredNode is not very elegant. Good enough for a one-off program, but not very reusable. What I would like to see is that the specialised getting and setting of the property could be implemented in the UploadedFileProperty so that it is contained and can be easily reused.

I couldn't figure out how to set the getter/setter in the property implementation. Any pointers would be appreciated.

While we are at it, thinking ahead. It would be very convenient if the Property could tie in with the delete hook and transaction manager. In that way, when the Structured node is deleted, the corresponding FD file could be deleted automatically. The same with the transaction manager: if the transaction is aborted, the corresponding FD file could be deleted as well.

I hope that you agree that a FileProperty could be a useful addition and that you could give it some thought. TIA

Marcel

@robinedwards
Copy link
Collaborator

Hey,

I think you may want to try using the hooks pre_save / post_save / pre_delete. You could write some kind of a mixin that can be shared for this. You should still have access to the full FD object here.

Hope this helps.

@mjmare
Copy link
Author

mjmare commented Feb 25, 2017

I agree, that's what I am doing now as a workaround in my application. However this is not a good reusable solution IMO. It requires the implementation of and/or coordination with preexisting pre_save / post_save / pre_delete hooks in the Node implementation, not in the Property. I cannot "just" add one or more, say, FileProperty's and have it just work, like any other property.
Ideally I'd have a dedicated, standalone property for a file (backed by FD). For this to happen, I think (but could be mistaken) I need to be able to override the getter/setter and to be able to hook into the transaction management at the property level, not the Node level. This could probably be done with some metaclass voodoo but that is beyond me.
Hope this makes any sense.

@mjmare
Copy link
Author

mjmare commented Feb 25, 2017

BTW FileDepot is just one backing store. Any other would probably fine as well.

@robinedwards
Copy link
Collaborator

Ok the thing is you don't want to override the property getter or setter. As setting and getting doesn't actually touch the database.

So what your saying is you don't want to have to have a mixin or base class to handle this special kind of property it would be nice if there was some way of hooking the save on to the property its self.

I will mention now that deflate is actually used (abused) in other places than saving, it's also used for making queries with filter(...) etc (i think).

I guess the ideal solution would be to support some kind of on_save hook for the property its self?

@mjmare
Copy link
Author

mjmare commented Feb 25, 2017

The attachment reflects my (possibly muddled) thinking on the deflate/inflate/getter/setter stuff.
Neomodel FileProperty.graffle.pdf
I hope it illustrates why I think the getter/setter is necessery.

Note that FD's file_id is a hex uuid string. FileIntent represents the file stuff going into FD (returning a file_id), FileStorage is what comes out of FD given a file_id.

@aanastasiou
Copy link
Collaborator

@mjmare Any further progress on this? I wonder if it could be marked as a future enhancement (?)

Let me see if I got this right: You are proposing a File Property which is supposed to be working like:

MyClassWithSomeFileAttribute.file_attribute="/mnt/project_data/dataset1.csv"

MyClassWithSomeFileAttribute.save()

And then execution is blocked until the "property" makes sure that the file has been uploaded before it can substitute the file for its file id (?).

I think that from the point of view of an OGM there are two things:

  1. Managing the Property (type)
  2. Managing the Content (value)

So, the property, could have things like where to accept files from, what sort of filename structure (maybe through regexp) would be valid, restrictions on file size, acceptable mime types, etc.

And the content would abstract the value itself. That is, the file. In this way, accessing the property would return a file that is almost ready for consumption, rather than a string filename that still requires some work until its content can be accessed. And it is at this level that different file storage backends could be abstracted too.

This is not exactly visible in neomodel but when you implement a Property, you implement the "handler" for a value, not the value itself.

It also sounds like something that might require special treatment for Batch Operations. For example, first create all entities with some file attached and then attach the files as a separate step (or make uploads asynchronous).

What do you think?

@mjmare
Copy link
Author

mjmare commented Apr 29, 2019

Sorry for the late reply.

Indeed in an ideal world I'd be able to add an "attachment" property to a Neomodel class. E.g.:

class SomeClass(StructuredNode):
    my_attachment = AttachmentProperty(...)

some_node = SomeClass()
some_node.my_attachment = open('my face.png')  // or some file like object

this would store some reference (uuid) on the node and store the file contents+metadata in some store (I used File Depot).

the_attachment = some_node.my_attachment 

print(the_attachment.filename)
print(the_attachment.content_type)
:

would return some object with the file contents and some metadata (called a StoredFile in File Depot).

Hope this clears it up a bit.

@aanastasiou
Copy link
Collaborator

@mjmare No worries, please see below:

this would store some reference (uuid) on the node and store the file contents+metadata in some store (I used File Depot).

A few notes which we can take into account further in the design of this property:

  1. Why uuid? Wouldn't the filename with respect to the root of the "store" be enough? In this way we "adopt" a file-system constraint rather than trying to create another one.

  2. This might need a "hidden" attribute that concatenates the file-store and location in order for it to be indexable by the existing algorithms. Otherwise, we have to make sure that these properties cannot be indexable.

  3. The bit that might end up as a nightmare is handling errors raised by the file-store. I guess that an additional condition would be required here to upload the file successfully first (or handle the potential exceptions this part might raise) and proceed with creating the node after this if everything is in place.

  4. Is there a package that abstracts a number of different file stores already? If not, we might have to provide that ourselves but leave it up to an implementer to populate for a service that would not yet be covered.

But yeah, seems like a useful thing to have.

@mjmare
Copy link
Author

mjmare commented May 8, 2019

  1. Why uuid? Wouldn't the filename with respect to the root of the "store" be enough? In this way we "adopt" a file-system constraint rather than trying to create another one.

I think the filename is part of the metadata. Using a uuid makes file(name) manipulations easier. Also when storage is not on the file system (eg database), the filename may not ideal. And how would one handle uploads with identical filenames (but different file contents)?

  1. This might need a "hidden" attribute that concatenates the file-store and location in order for it to be indexable by the existing algorithms. Otherwise, we have to make sure that these properties cannot be indexable.
  2. The bit that might end up as a nightmare is handling errors raised by the file-store. I guess that an additional condition would be required here to upload the file successfully first (or handle the potential exceptions this part might raise) and proceed with creating the node after this if everything is in place.

I think Neomodel should assume the file is available on the server before setting a Node's attribute.
OTOH it would be very convenient if the FileProperty could tie in with the delete hook and transaction manager. In that way, when the Structured node is deleted, the corresponding file could be deleted automatically. The same with the transaction manager: if the transaction is aborted, the corresponding file could be deleted as well.

  1. Is there a package that abstracts a number of different file stores already? If not, we might have to provide that ourselves but leave it up to an implementer to populate for a service that would not yet be covered.

The package FileDepot (https://pypi.python.org/pypi/filedepot) does a lot already but is independent of neo4j/NEomodel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants