-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Google Cloud storage as a DataStore #106
Conversation
Wow, this is truly amazing and impressive, especially since I didn't expect it 👍 Thank you a lot for this. Unfortunately, I am currently a bit busy due to private circumstances and won't be able to fully review it in the next two weeks but I will do so afterwards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for the delay on my side but I had a deep dive into your code and it looks very good. Yet, I found a few topic worth talking about. As a last wish, would you be so kind a run go fmt
since I found a few style inconsistencies?
gcsstore/gcsservice.go
Outdated
|
||
import ( | ||
"io" | ||
"context" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still claiming support for Go 1.5 and 1.6 which to not include the context package. Would it be possible to use golang.org/x/net/context instead?
gcsstore/gcsstore.go
Outdated
split := strings.Split(name, "_") | ||
idx, err := strconv.Atoi(split[len(split)-1]) | ||
if err != nil { | ||
return -1, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should return 0 here as that's the default value for int64s and also matches the remaining returned values.
@@ -0,0 +1,154 @@ | |||
package gcsstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proper documentation for all exported types and functions in this file would be really appreciated. :)
gcsstore/gcsservice.go
Outdated
} | ||
|
||
m[i] = objAttrs.Name | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to collect the names directly in a slice using https://godoc.org/google.golang.org/api/iterator#PageInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of this interface is that it is not possible to convert directly between an iterator and a slice. This is because, depending on the number of objects returned by the query, additional API calls might be made to fetch more objects. That is why Next()
and Done
are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. However, it might be more efficient to just create a slice (with a start capacity of 10 or so) and let it grow while you append
the new object names. Since the capacity of the slice is doubled if it has to be reallocated (see https://play.golang.org/p/10KXfDndw4), we could get rid of the intermediate map.
@@ -0,0 +1,224 @@ | |||
package gcsstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation for the GCSStore struct and New function will be really nice. In addition, information about what access rights are required for this should be included in a similar fashion as it's done for the s3store package: https://github.com/tus/tusd/blob/master/s3store/s3store.go#L3
gcsstore/gcsstore.go
Outdated
} | ||
|
||
info.Offset = info.Offset + n | ||
err = store.WriteInfo(id, info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the call to WriteInfo fails (for whatever reason), you can end up in an inconsistent state where you were able to upload the chunk to the bucket but didn't update the offset. In order to avoid this issue, the s3store does not update the offset after writing a chunk but instead calculates the offset for each call to GetInfo (see https://github.com/tus/tusd/blob/master/s3store/s3store.go#L309-L335). It would be nice to have a similar approach here.
gcsstore/gcsstore.go
Outdated
} | ||
|
||
var objectParams GCSObjectParams | ||
for _, name := range names { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to reuse the same functionality from Terminate here as they both to pretty much the same?
gcsstore/gcsstore.go
Outdated
Sources: names, | ||
} | ||
|
||
err = store.Service.ComposeObjects(composeParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also be helpful to attach the metadata to the composed object (compare with https://github.com/tus/tusd/blob/master/s3store/s3store.go#L203)
@@ -0,0 +1,337 @@ | |||
package gcsstore_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be beneficial to also add tests for the GCSApi as that is the part which is directly communicating with the GCS.
Prefix: prefix, | ||
} | ||
|
||
names, err := store.Service.FilterObjects(filterParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that the object names are returned in the correct order? I believe this could have issues if you have more than 10 chunks uploaded. For example, you have following objects: foo_0, foo_1, ..., foo_9, foo_10. Will the ordering in this case be foo_0, foo_1, ..., foo_9, foo_10 (as wanted) or foo_0, foo_1, foo_10, foo_2, ..., foo_9 (sorted lexicographically)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, good catch. It seems query responses are sorted lexicographically and therefore are not composed correctly. I also just realized that Google limits compositions to a max of 32 objects. I will make changes to fix these two issues.
@Acconut I've made the following changes to this PR:
I have updated the |
This is looking really good! Question: should we write a new blog post up on tus.io that we have gcs support once all's said & done? @tab1293 I think it would be great Tom if you or a teammate could take the lead in that as author. If you don't have the time however, we could also figure something out with the Transloadit crew, as I believe it will be good for tus to get the word out on this |
@kvz yes good idea. I can have a blog post written about this probably in the next week or so. Is this a blocker for merge? |
Definitely not, we can add that later! This was just a slightly related idea I had, I'll leave the actual code review to Marius as Go is not my strong suit |
Let's phrase it like following: I could sleep better if this PR already included the necessary tests ;) Besides that, your changes are looking very good! I will have another in-depth look this weekend. |
@tab1293 This PR is looking amazing, in particular after your latest round of updates! A few more things (besides test), please:
|
…instructions for AppVeyor
That's a good idea but I was also talking about tests for the |
Hey @Acconut, I've updated this and added the vendoring dependencies so that the travis tests pass for Go 1.6-1.8. However, the underlying grpc package that is used by the google api dropped support for Go1.5 (grpc/grpc-go@0a20758), and uses |
Perfect 👍
Yes, that would be nice. Also, what's the status on the additional tests (see #106 (comment))? I'm sorry that I am so picky about them but I like my code tested :) |
@Acconut I'll commit a change tomorrow to skip the tests in Go 1.5, and hopefully later this week I'll add in some tests for |
@tab1293 I am still waiting on the tests for |
@Acconut I can't seem to think of a way to test |
Either that or you create your own mock server using httptest.Server (https://golang.org/pkg/net/http/httptest/#Server) and then point the GCSService to it. It allows you to assert the incoming requests and also tests how well the client handles errors. |
Also, thinking about this, would it be possible to not export GCSService and reduce the exposed API making changes in the future easier? |
@Acconut I still don't see how creating a mock http server will enable us to create tests for the GCSService. If you look at the implemented interface methods such as |
I disagree. I would not consider functions, such as compose (https://github.com/tus/tusd/pull/106/files#diff-c920e8ef8f3f85a7b94ee8d23483c11aR197) and recursiveCompose (https://github.com/tus/tusd/pull/106/files#diff-c920e8ef8f3f85a7b94ee8d23483c11aR255), thin wrappers. Instead they contain a lot of logic which needs tests to ensure they keep working in the future. I know that writing tests is not fun but if software hits the production environment, tests are mandatory.
As I pointed out some time ago, Go has a built-in HTTP server for testing (https://golang.org/pkg/net/http/httptest/#Server). Using it is relatively easy: You start it, point the GCSService at the server's URL and assert that the proper requests are coming in. Here is also an article which explains this concept a bit more: https://mkaz.tech/geek/testing-clients-to-an-http-api-in-go/. If you want to, I could also write an example test. |
* Refactor as a struct of clojures for testing * Refactor to support tests * Add docs * Struct members cannot fufill the interface, adding wrapper functions
Hey @Acconut, I've updated this PR with 2 tests for Unfortunately the Google Storage library cannot be easily ran with a mock HTTP test server since the Google Cloud Storage library assumes you are interacting with their infrastructure, meaning that you cannot pass in the mock server's address. Due to this, I opted to refactor some of the code to support closures as members of the |
Are you sure? AFAIK, the SDK has a WithEndpoint (https://godoc.org/google.golang.org/api/option#WithEndpoint) option which can be used in a similar fashion to you do it here: https://github.com/tus/tusd/pull/106/files#diff-c920e8ef8f3f85a7b94ee8d23483c11aR90 Also, thank you for your efforts :) |
Yep, I tried the The internal tests to Google Storage do utilize the |
@Acconut ^ |
@peixian I apologize for not responding here but I am very busy due to private reasons and cannot take a look at the current situation until next week. I hope you understand :) |
@peixian I deeply apologize for my delay, other projects occupied my time and I lost a bit track of your work here. This provided me the time to attempt writing tests on my own. As you correctly said the Below you can find an example test for the package gcsstore_test
import (
"context"
"testing"
"gopkg.in/h2non/gock.v1"
"cloud.google.com/go/storage"
. "github.com/tus/tusd/gcsstore"
"google.golang.org/api/option"
)
func TestGetObjectSize(t *testing.T) {
defer gock.Off()
gock.New("https://www.googleapis.com").
Get("/storage/v1/b/test-bucket/o/test-name").
MatchParam("key", "foo").
Reply(200).
JSON(map[string]string{"size": "54321"})
ctx := context.Background()
client, err := storage.NewClient(ctx, option.WithAPIKey("foo"))
if err != nil {
t.Fatal(err)
}
service := &GCSService{
Client: client,
Ctx: ctx,
}
size, err := service.GetObjectSize(GCSObjectParams{
Bucket: "test-bucket",
ID: "test-name",
})
if err != nil {
t.Errorf("Error: %v", err)
}
if size != 54321 {
t.Errorf("Error: Did not match given size")
}
} What do you think of this approach? |
@Acconut I think Gock is a fine solution to this problem, I've updated the PR with Gock tests for each function in |
@peixian Those tests look very good, nice work 👍 Since we found a way to mock the Google Cloud Storage API, I believe these functions properties (6eb4efe#diff-c920e8ef8f3f85a7b94ee8d23483c11aR78) in GCSService can be removed, can't they? |
Just merged your PR. I would like to thank you very much for your efforts and especially your patience with my requests. It is greatly appreciated :) |
🎉 🎉 |
This pull request allows tusd users to specify Google Cloud storage as their datastore backend. By passing the
--gcs-bucket [bucket-name]
flag and setting theGCS_SERVICE_ACCOUNT_FILE
environment variable to point to their Google service account file, tusd can now upload to GCS. @Acconut please let me know what you think of this feature and if there are any changes you'd like me to make.cc @vayam