Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Space semantics in the CS3APIs #3693

Open
labkode opened this issue May 5, 2022 · 7 comments
Open

Storage Space semantics in the CS3APIs #3693

labkode opened this issue May 5, 2022 · 7 comments

Comments

@labkode
Copy link
Member

labkode commented May 5, 2022

This writeup comes from a deep investigation performed by @ishank011 @gmgigi96 @glpatcern as part of our activity to run OCIS edge branch at CERN.

Examples on edge branch running ocisfs

4c510ada-c86b-4815-8820-42cdf82c3d51 -> / is the home space ID
254d0b60-20e4-4340-9eae-6fa9103ae7d7 -> /app-try
7321538e-15da-4352-8dd7-d59b3319e7ef -> /app-try/app-new-try.txt

  • A
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::resource_id::opaque_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::path (TYPE_STRING) =>

"id": {
      "opaqueId": "4c510ada-c86b-4815-8820-42cdf82c3d51",
      "storageId": "4c510ada-c86b-4815-8820-42cdf82c3d51"
    },
"path": "/",
  • B
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) =>
ref::resource_id::opaque_id (TYPE_STRING) =>
ref::path (TYPE_STRING) => /

{
  "status": {
    "code": "CODE_NOT_FOUND",
    "message": "gateway could not find space for ref=resource_id:\u003c\u003e path:\"/\" ",
    "trace": "909ed9213ca8a16c59d3cb0f545e0767"
  }
}
  • C
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::resource_id::opaque_id (TYPE_STRING) => 254d0b60-20e4-4340-9eae-6fa9103ae7d7
ref::path (TYPE_STRING) =>

"id": {
      "opaqueId": "254d0b60-20e4-4340-9eae-6fa9103ae7d7",
      "storageId": "4c510ada-c86b-4815-8820-42cdf82c3d51"
    }
"path": "/app-try",
  • D
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::resource_id::opaque_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::path (TYPE_STRING) => /app-try

"id": {
      "opaqueId": "254d0b60-20e4-4340-9eae-6fa9103ae7d7",
      "storageId": "4c510ada-c86b-4815-8820-42cdf82c3d51"
    },
"path": "/app-try"
  • E
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::resource_id::opaque_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::path (TYPE_STRING) => /app-try/app-new-try.txt

"id": {
      "opaqueId": "7321538e-15da-4352-8dd7-d59b3319e7ef",
      "storageId": "4c510ada-c86b-4815-8820-42cdf82c3d51"
    },
"path": "/app-try/app-new-try.txt",
  • F
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::resource_id::opaque_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::path (TYPE_STRING) => ./app-try/app-new-try.txt

"id": {
      "opaqueId": "7321538e-15da-4352-8dd7-d59b3319e7ef",
      "storageId": "4c510ada-c86b-4815-8820-42cdf82c3d51"
    },
"path": "app-new-try.txt",
  • G
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) =>
ref::resource_id::opaque_id (TYPE_STRING) =>
ref::path (TYPE_STRING) => /app-try/app-new-try.txt
<repeated> arbitrary_metadata_keys (TYPE_STRING) =>
{
  "status": {
    "code": "CODE_NOT_FOUND",
    "message": "gateway could not find space for ref=resource_id:\u003c\u003e path:\"/app-try/app-new-try.txt\" ",
    "trace": "c2ed541c740faa4ba9362366368d8479"
  }
}
  • H
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::resource_id::opaque_id (TYPE_STRING) => 7321538e-15da-4352-8dd7-d59b3319e7ef
ref::path (TYPE_STRING) => .

"id": {
      "opaqueId": "7321538e-15da-4352-8dd7-d59b3319e7ef",
      "storageId": "4c510ada-c86b-4815-8820-42cdf82c3d51"
    },
"path": "app-new-try.txt",
  • I
cs3.gateway.v1beta1.GatewayAPI@localhost:19000> call Stat
<repeated> opaque::map::key (TYPE_STRING) =>
ref::resource_id::storage_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::resource_id::opaque_id (TYPE_STRING) => 4c510ada-c86b-4815-8820-42cdf82c3d51
ref::path (TYPE_STRING) => .

"id": {
      "opaqueId": "4c510ada-c86b-4815-8820-42cdf82c3d51",
      "storageId": "4c510ada-c86b-4815-8820-42cdf82c3d51"
    },

Expected desired behaviour

Request:                                    Response
    res_id = nil                                res_id = {storage_id, opaque_id_c}
    path   = /top/a/b/c                         path   = /top/a/b/c
                                                parent_id = {storage_id, opaque_id_b}

eos file info /a/b/c                                                
                                                
------------------------------------------------------------------------------------

Request:                                    Response
    res_id = {storage_id, opaque_id_c}          res_id = {storage_id, opaque_id_c}
    path   = nil                                path   = /top/a/b/c       # edge is returning /c, needs to be fixed
                                                parent_id = {storage_id, opaque_id_b}

eos file info inode:opaque_id_c                                                
                                                
------------------------------------------------------------------------------------

Request:                                    Response
    res_id = {storage_id, opaque_id_a}          res_id = {storage_id, opaque_id_c}
    path   = ./b/c                              path   = /top/a/b/c       # edge is returning c, needs to be fixed
                                                parent_id = {storage_id, opaque_id_b}

path_a = eos file info inode:opaque_id_a                                                
eos file info path_a/b/c                                                

------------------------------------------------------------------------------------

Request:                                    Response
    res_id = {storage_id, opaque_id_c}          res_id = {storage_id, opaque_id_c}
    path   = .                                  path   = /top/a/b/c      # edge is returning c, needs to be fixed
                                                parent_id = {storage_id, opaque_id_b}
eos file info inode:opaque_id_c 

------------------------------------------------------------------------------------

Request:                                    Response    # eosfs will always return HTTP 404 here
    res_id = {storage_id, storage_id}           res_id = {storage_id, storage_id}
    path   = nil                                path   = nil   # the "name" of the space root node is not exposed
                                                parent_id = nil?
--
In the Request, nil or "." or "" paths SHALL be considered synonyms

The comment about semantics of path field will need to be amended:
https://cs3org.github.io/cs3apis/#cs3.storage.provider.v1beta1.ResourceInfo

Conclusion

Firstly, examples A and G show a defective implementation that doesn't honour the CS3APIs. Paths are a first class citizen of the API and are expected to work.

Secondly, the current implementation in ocisfs is poorly designed with potential nefarious consequences for clients.
Simply put, clients will need to have memory and and traversing trees will not be performant.

For example, having the tree /a/b/c/d a client that works with paths can query /a, /a/b or/a/b/c.

A client that works with ids and that knows about d will need to query the parent of d, which is c, then the parent of c which is b and then the parent of b which is a. Performance will be a function of tree depth. A workaround you may suggest is to "remember" node information in the client, however this pushes complexity to any client using the API (desktop sync, mobile) and basically anyone willing to integrate with OCIS in an optimal way.

To illustrate this behaviour from UNIX/POSIX environment, we're inside folder /var/tmp/data/log/2022/05/03/proxy.

In your case to go to: /var folder you need to do:

cd ..
cd ..
cd ..
cd ..
cd ..

By path you do:

cd /var/

@butonic @micbar @dragotin

@micbar
Copy link
Contributor

micbar commented May 5, 2022

Comments from ocis

Both CERN and ownCloud decided together the following scheme:

A resourceID contains either:

  • I: only an 1) absolute path (Starting with a leading slash) and 2)no IDs
  • II: or an 1) ID and 2) no path
  • III: or an 1) ID and 2) a relative path (Always starting with a .)

II was returning an absolute path in reva master. This is changed in the edge branch. A stat, like on the filesystem has no knowledge about the path, only the file basename. We know and return the parentID. The full path can be queried with GetPath().

Examples from @labkode

  • A is a valid request and a valid response
  • B and G this requests cannot work. If you don't send and ID, we cannot route the request. Clients with path based access need to call ListStorageSpaces with the path to lookup the spaceID.

Comments on the desired request/responses

Request:                                    Response
    res_id = nil                                res_id = {storage_id, opaque_id_c}
    path   = /top/a/b/c                         path   = /top/a/b/c
                                                parent_id = {storage_id, opaque_id_b}

# path without a res_id cannot work
eos file info /a/b/c                                                
                                                
------------------------------------------------------------------------------------

Request:                                    Response
    res_id = {storage_id, opaque_id_c}          res_id = {storage_id, opaque_id_c}
    path   = nil                                path   = /top/a/b/c       # edge is returning /c, needs to be fixed
                                                parent_id = {storage_id, opaque_id_b}

# this request is valid, it behaves like a filesystem stat. `/c` is only the basename to the res_id. We cannot expect a path to the parent in case. To get the path, call `GetPath`

eos file info inode:opaque_id_c                                                
                                                
------------------------------------------------------------------------------------

Request:                                    Response
    res_id = {storage_id, opaque_id_a}          res_id = {storage_id, opaque_id_c}
    path   = ./b/c                              path   = /top/a/b/c       # edge is returning c, needs to be fixed
                                                parent_id = {storage_id, opaque_id_b}

# same as above, c is the basename, we cannot expect that the path to the root is returned
path_a = eos file info inode:opaque_id_a                                                
eos file info path_a/b/c                                                

------------------------------------------------------------------------------------

Request:                                    Response
    res_id = {storage_id, opaque_id_c}          res_id = {storage_id, opaque_id_c}
    path   = .                                  path   = /top/a/b/c      # edge is returning c, needs to be fixed
                                                parent_id = {storage_id, opaque_id_b}
# same as above
eos file info inode:opaque_id_c 

------------------------------------------------------------------------------------

Request:                                    Response    # eosfs will always return HTTP 404 here
    res_id = {storage_id, storage_id}           res_id = {storage_id, storage_id}
    path   = nil                                path   = nil   # the "name" of the space root node is not exposed
                                                parent_id = nil?

# path is not set when it is the root folder. 
--
In the Request, nil or "." or "" paths SHALL be considered synonyms

Firstly, examples A and G show a defective implementation that doesn't honour the CS3APIs. Paths are a first class citizen of the API and are expected to work.

We thought that we had a mutual agreement that the spaces are first class citizens and all paths are relative to a space.

For example, having the tree /a/b/c/d a client that works with paths can query /a, /a/b or/a/b/c.

A client that works with ids and that knows about d will need to query the parent of d, which is c, then the parent of c which is b and then the parent of b which is a. Performance will be a function of tree depth. A workaround you may suggest is to "remember" node information in the client, however this pushes complexity to any client using the API (desktop sync, mobile) and basically anyone willing to integrate with OCIS in an optimal way.

To illustrate this behaviour from UNIX/POSIX environment, we're inside folder /var/tmp/data/log/2022/05/03/proxy.

Let us elaborate on this.

You know the id of /var/tmp/data/log/2022/05/03/proxy
You do a stat() with the ID, the response will give you the parentID and the stat information.
If you need the path, you do a getPath() which will give you the root of the space and eg ./data/log/2022/05/03/proxy. So in this case /var/tmp/ is the space mountpoint.

Now, after two request you are able to do all kind of operations within the whole depth of this tree. We never encounter cases in where clients need to do the example

cd ..
cd ..
cd ..
cd ..
cd ..

Summary

  • Spaces are the new first class citizens
  • Spaces always require an indirection and need to be discovered before clients can interact with files.
  • The spaces indirection moves logic to the clients, that is explicitly desired and core of the spaces concept. The server should never assume a data layout on the client because the server has no knowledge of the client use case. E.g. a Desktop Client works always path-based because it creates a local POSIX tree. Web or mobile Clients on the contrary are not bound to these filesystem semantics and want can present spaces in more sophisticated ways.
  • Spaces are not considered to come from one instance only. So spaceIDs are globally unique and clients can connect to federated storage spaces without changing the scheme.

@dragotin Please also add your thoughts, because you were always insisting on that level of indirection with the clients in mind.

@micbar micbar changed the title Harden and fix access to storage by path Storage Space semantics in the CS3Api May 5, 2022
@dragotin
Copy link
Contributor

dragotin commented May 5, 2022

I agree what is written in the summary above. That is IMHO a good summary of what spaces are about. It is important to note that - from a clients POV - it is always needed to query the list of available spaces first, to know the space IDs. That is needed because clients should be able to organize the spaces according the abilities of the platform of the client. For example, a desktop client will put the different types of spaces to different places than the web client.

With the spaces, there is no absolute file tree of all files any more by default.

@labkode what is the actual problem that you need to solve? Can we look into it together?

@butonic
Copy link
Member

butonic commented May 6, 2022

IMO there is no way around making clients smarter aka teaching them how to discover spaces so they can work in with a truly federated storage API. For that the cleanest approach is to always require a relative CS3 Reference in requests. That being said we obviously need backwards compatability:

  • To allow path based access, the ocdav service currently implements the space discovery. This should move to the sdk so other clients can reuse that logic easily and benefit from distributed spaces.
  • To allow old or dumb CS3 clients to work with path only requests we should IMO implement a pathstorageprovider that does the space discovery and then acts as a client.

We could define levels for CS3 clients to indicate that a client

  • Level 0 works path based, sending path only references to the gateway
  • Level 1 uses spaces discovery and then sends relative references to the gateway

@labkode
Copy link
Member Author

labkode commented May 6, 2022

@dragotin

The problem we're trying to solve is to implement the expected behaviour of spaces on the edge branch for EOS and I have to admit that we're struggling .
We find contradicting behaviours and the only source of knowledge up to date looks like is the codebase, where we have to reverse-engineer the ocisfs implementation and navigate through the code.

Would be possible to express the semantics of how spaces should behave on the CS3APIs documentation? Currently the information is spread across CS3APIS, codebase, GH issues and ADRs.

@micbar Thank you for the explanation, that is indeed better than our initial assumptions, however we still have some doubts.

  • The first one is the mount-point, there is no mention of that in the CS3AP[1], is the mount point the "name" field in the space info?

  • The second one is the GetPath(SpaceID) operation, you can get the same path depending on the input you give, i.e, I can have /photos folder under space A and also in space B, so purely on the response you don't know where this path belongs to and is left to the client to link this path to the input provided.

  • The third one is how can you go from a leaf node to the space root? How is this implemented in ocisfs, do you traverse all the parents until reaching a root node (means not having a parent)?

The more I think about it the more I think that having a path field in responses is misleading for Spaces.
What do we break if we change the path field to basename and we enforce that it doesn't contain slashes (/)?
Is there any codebase that uses path internally? virtual share folders?
A path would still be needed in requests to specify a relative path to the root space resource id, but not needed at all in the responses.

[1] https://cs3org.github.io/cs3apis/#cs3.storage.provider.v1beta1.StorageSpace

@micbar
Copy link
Contributor

micbar commented May 6, 2022

@labkode thanks for the answer!

The problem we're trying to solve is to implement the expected behaviour of spaces on the edge branch for EOS and I have to admit that we're struggling .
We find contradicting behaviours and the only source of knowledge up to date looks like is the codebase, where we have to reverse-engineer the ocisfs implementation and navigate through the code.

That is unfortunate, I offer any help possible. After the first beta next week, our resources are not so streched anymore.

Would be possible to express the semantics of how spaces should behave on the CS3APIs documentation? Currently the information is spread across CS3APIS, codebase, GH issues and ADRs.

Yes! But specially after this discussion in this ticket i have more and more the feeling that we are already 90% alingned and need to overcome the last 10% to get cernbox running and performant on the edge branch. That is still one of our most important missions.

To make things better, we will provide a proposal for the cs3 api changes in form of a PR next week.

Another project from @wkloucek and me is the https://github.com/owncloud/cs3api-validator/ project which is now also available as a docker image. We have it running on reva and ocis CI already. API spec, documentation and this testSuite can provide a good start for a better developer experiance.

The more I think about it the more I think that having a path field in responses is misleading for Spaces.
What do we break if we change the path field to basename and we enforce that it doesn't contain slashes (/)?
Is there any codebase that uses path internally? virtual share folders?
A path would still be needed in requests to specify a relative path to the root space resource id, but not needed at all in the responses.

Yes let us follow up on that. It is already on our agenda during the beta phase.

@labkode labkode changed the title Storage Space semantics in the CS3Api Storage Space semantics in the CS3APIs May 6, 2022
@micbar micbar added Priority:p2-high Escalation, on top of current planning, release blocker Topic:Documentation Topic:API and removed Type:Discussion labels Jul 4, 2022
@stale
Copy link

stale bot commented Sep 4, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status:Stale label Sep 4, 2022
@stale stale bot closed this as completed Sep 15, 2022
@micbar micbar reopened this Dec 13, 2022
@micbar
Copy link
Contributor

micbar commented Dec 13, 2022

Needs documentation and tests about final outcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Tests
Development

No branches or pull requests

4 participants