Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: File improvements #7088

Closed
wants to merge 8 commits into from
Closed

Conversation

dblythy
Copy link
Member

@dblythy dblythy commented Dec 20, 2020

New Pull Request Checklist

Issue Description

Works towards adding:
-ACL for Parse.File using temporary tokens
-Allows beforeSave(Parse.File) and afterSave(Parse.File)
-Creates beforeFind(Parse.File) and afterFind(Parse.File) to allow editing the file contents depending on the request (you can return a different Parse.File if you'd like). Uses temporary tokens to associate req.user.
-Creates _File class for references -> adds and removes references on object save. Eventually will be used to cleanup "orphan" files
-Created _File schema to allow preventing file upload via schema

Draft PR to see your thoughts, and might need collaboration in fixing the schema stuff that is commented (@Moumouls if you're interested in helping 🎉)

Related issue: #6780

TODOs before merging

  • get file schema going as expected
  • JS SDK method for file.setACL
  • work out how this is going to work with S3
  • fix postgres
  • Add test cases
  • increase coverage
  • Add entry to changelog
  • Add changes to documentation (guides, repository pages, in-code descriptions)

@codecov
Copy link

codecov bot commented Dec 21, 2020

Codecov Report

Merging #7088 (32c60bb) into master (41a052c) will decrease coverage by 9.74%.
The diff coverage is 84.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7088      +/-   ##
==========================================
- Coverage   93.66%   83.92%   -9.75%     
==========================================
  Files         169      169              
  Lines       12499    12656     +157     
==========================================
- Hits        11707    10621    -1086     
- Misses        792     2035    +1243     
Impacted Files Coverage Δ
...dapters/Storage/Postgres/PostgresStorageAdapter.js 2.42% <0.00%> (-93.52%) ⬇️
src/rest.js 98.86% <ø> (ø)
src/Controllers/FilesController.js 84.02% <80.20%> (-9.98%) ⬇️
src/Routers/FilesRouter.js 87.36% <87.87%> (-0.92%) ⬇️
src/Controllers/SchemaController.js 97.14% <100.00%> (-0.19%) ⬇️
src/RestQuery.js 95.54% <100.00%> (+0.01%) ⬆️
src/RestWrite.js 93.37% <100.00%> (-0.31%) ⬇️
src/Routers/UsersRouter.js 93.75% <100.00%> (-0.63%) ⬇️
src/cloud-code/Parse.Cloud.js 98.63% <100.00%> (+0.03%) ⬆️
src/triggers.js 94.85% <100.00%> (+0.25%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41a052c...1e116a5. Read the comment docs.

src/Controllers/FilesController.js Outdated Show resolved Hide resolved
Comment on lines +662 to +665
classLevelPermissions: {
find: {
'*': true,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should disable the find permission for security reasons because not every developers will take the time to ensure that the parent object acls are transfered on the file ACL.
Current configuration will be less secure than the current one if no ACL configured. May be we should add a default behavior: Parse.File automatically add ACL from parent objet to itself.

This way Parse.File will be only reachable based on the parent object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

Comment on lines 178 to 182
const references = fileObject.references || [];
if (!references.includes({ objectId: object.objectId, className })) {
references.push({ objectId: object.objectId, className });
toSave = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand, reference is an Array of Pointer. Here it could be a bottle neck if a file has too many references :)
(Like a admin upload an avatar, and all the user base use it). May be reference could be only a simple native DB counter (transaction activated) to keep how many objects use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also references update should be ACID, otherwise we could start to have invalid reference count/array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this on the latest revision, pls let me know if I understood you correctly. Thank you so much for you feedback

Comment on lines 164 to 177
if (url.length > 1) {
appendToken = `${url[1]}&`;
}
const token = randomString(25);
appendToken += `token=${token}`;
file.url = `${url[0]}?${appendToken}`;
const expiry = new Date(new Date().getTime() + 30 * 60000);
tokens.push({
token,
expiry,
user: auth.user,
});
toSave = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure to understand why we have tokens here, are you open to create a quick mind map on how you see the new File system implementation business logic ?

Free download and free to use, useful for code architecture: https://www.xmind.net/fr/xmind2020/ 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use token may be we should prefer a stateless system like jsonwebtoken with a derived hash of the master key as the secret.
(I use this kind of system for the webauthn JWT), it will avoid token management into our code.

const getJwtSecret = config => {
  const hash = crypto.createHash('sha512');
  hash.update(config.masterKey, 'utf-8');
  // Security:
  // sha512 return 128 chars, we can keep only 64 chars since it represent 6,61E98 combinations
  // using the hash allow to reduce risk of compromising the master key
  // if brute force is attempted on the JWT
  return hash.digest().toString('hex').slice(64);
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am thinking:

If a Parse.Object is fetched with a Parse.File in it, we want to:

  1. Check if the req.user has correct ACL to view File
  2. If they do, attach a "token" to the req.file.url. Create a DB reference that links req.user with the token, with an expiry time.
  3. When there is a GET request on the file, lookup the Token. If no token exists, throw file not found, as no Parse.Object has been inflated with a Parse.File.
  4. Call beforeFind on Parse.File using token.user (saved from step 2, so that the file trigger can know the auth)
  5. If beforeFind throws, return an error. Otherwise, res.end with the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the req.user has correct ACL to view File
When there is a GET request on the file

Here i see 2 things:

  • Security during Parse.Query operation, managed by classic database controller permission check
  • Before returning the file url, create a JWT with real file url signed by a derived masterkey, replace the real url with JWT url
  • The browser will use the temporary jwt file name ex: https://my-parse.com/file/HereTheJWT
  • Parse decode the JWT, get the real file URL, then stream the real file content into response

Here we do not need to re-check the user permission on browser GET request since the JWT could be super short lived (1-2min), and will be always generated ONLY if user has permission to get the object file.

Here a limitation of token system and also jwt file url: It will disable the file browser cache.
But currently it's not suitable to force User to set up a Parse Session Token cookie.

Note: If a file object is PUBLICLY READABLE, do not generate the JWT and return the url directly. It will allow developers to get file without initiating a query. (ex: avatar account choice at login time).

What do you think about that ? @dblythy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem with that is attaching request.user in the new beforeFind(Parse.File) trigger. I understand what you mean by the JWT token, but if we're going to have a beforeFind(Parse.File) trigger run before the GET request, it would be nice to attach which user / masterKey the token is associated with, even for 'public' files (req.user == null)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a broader explanation in the comments below, let me know any thoughts

src/Controllers/FilesController.js Outdated Show resolved Hide resolved
src/Controllers/FilesController.js Outdated Show resolved Hide resolved
@dblythy
Copy link
Member Author

dblythy commented Dec 21, 2020

Ok so I hope this helps @Moumouls. Here is a conceptual explanation of the code. There are two fundamental features:

File Tokens, and File references.

Let's begin with File Tokens.

There is no way to know who performed a get request on a URL. How this code gets around that:

  1. On file save, create an internal _File object. This will be used to track references and ACLs.
  2. Set the _File class acl in accordance with Parse.File.setACL()

Ok, so now there is a link with a database store and a file object.

Now, after an object is queried and a file is found, we need to provide a token to connect the user who made the query with the user who will ultimately use that object to perform a GET request on file.url.

Let's assume that object.get('image') is a file.

  1. User runs const object = await query.first()
  2. Code finds a file, great! Check if req.user can access file according to _File ACL.
  3. If they can access, save a _FileToken object, with 'user' as req.user, a random token, and a token expiry. Append token to object.image.file._url.
  4. return the object
  5. Now when user GETs object.get('image').url():
    5.1. Router will lookup token. if the token hasn't expired, it will consider the user who made the get request to be token.get('user') (as they were the one authorized to generate the token URL)
    5.2. Parse.Cloud.beforeFind(Parse.File) will run using the auth from the token
    5.3. GET request will return content based on file or cloud function.

Creating tokens happens for "public" files too, so cloud code can have full control of which users called what via Parse.Cloud.beforeFind(Parse.File).

File References

As pointers have to all be of the same class, I have created _FileReferences. Simply, when an object is mutated or saved, the code loops through and looks for files in original or object. If object.key has a file and original.key doesn't, add a reference to the _File relation. If original.key has a file and object.key doesn't, the file has been removed, so remove the reference from the _File relation.

If query a Parse.Object with a file but the _File references relation doesn't contain the reference to the Parse.Object, add it in. This is so over time the references grow with existing datasets.

@Moumouls
Copy link
Member

Moumouls commented Dec 21, 2020

Thanks @dblythy for the explanation.

Here we are sure of one thing: we need token in file url.
But think beforeFind should not be triggered when the browser ask the file here why:

If we use token (DB or JWT) with temporary access (1-2min) generated at query response time (like: const object = await query.first()) we can consider that access is now granted through the token for a given time. beforeFind should also be executed at Query time to handle business logic, and may throwing an error that prevent token generation. This way we can run safely all triggers without needing additional custom implementation for File class.

Then on the file GET route (triggered by the browser) we only need to validate the token (since permission was already checked and beforeFind triggered at query time)

Also i'm pretty sure that having token for public files will lead to some issue for developers who transmits URLs to other services (documents, avatars, etc...). Public file should be always accessible without a token to avoid blocking developers on their usage.

Here a quick xmind on how i see the implementation currently:

Capture d’écran 2020-12-21 à 18 17 49

Tell me @dblythy if you see something strange or if i miss something 😉

@dblythy
Copy link
Member Author

dblythy commented Dec 21, 2020

So the beforeFind(Parse.File) should be called when a query object contains a file? and if the Cloud trigger throws, don't attach a valid JWT token to the url (otherwise proceed and attach a JWT token)? You could also return a different Parse.File here too.

I like this approach. We could also attach req.object so perhaps there could be some context to the object that the file is associated with.

The only caveats:

  • if you threw an error in beforeFind(Parse.File), the error text would just show as "file not found".
  • if you returned a new Parse.File or returned different data from the beforeFind, it would have to be saved so the get handler knows what to fetch.

@Moumouls
Copy link
Member

Moumouls commented Dec 21, 2020

In the new implementation since we will use Pointer system for Parse.FIle in Parse.Object; Parse server will trigger automatically beforeFind/afterFind trigger when file is requested (.include()) from the client: here code that execute include:

function includePath(config, auth, response, path, restOptions = {}) {

You could also return a different Parse.File here too.
For sure, this approach also allow a developer to modify the output with afterFind classic trigger. And get the benefit and all options of a standard Class.

Here i'm sure that we need to avoid a too complex implementation and use all systems already present in Parse Server for all other classes as much as possible 😃

The only custom thing that we need to implement correctly is the token system. 💯

if you threw an error in beforeFind(Parse.File), the error text would just show as "file not found".
Do you have the code reference about this ? i think that an error in beforeFind just return the error to the client

Not sure to totally understand this one

if you returned a new Parse.File or returned different data from the beforeFind, it would have to be saved so the get handler knows what to fetch.

Do you have a quick example ?

@dblythy dblythy closed this Feb 11, 2021
@dblythy
Copy link
Member Author

dblythy commented Feb 11, 2021

Closing for now but will come back around to build this feature in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants