Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] MFS improvements #4758

Closed
wants to merge 6 commits into from
Closed

[WIP] MFS improvements #4758

wants to merge 6 commits into from

Conversation

schomatis
Copy link
Contributor

@schomatis schomatis commented Mar 2, 2018

This is a work in progress, not ready to be merged.

This PR is a (rebased) copy of #4517, as discussed, the idea is to add the fix for #4519 and get it ready to merge.

  • Avoid syncing/flushing after closing an fd.
  • Avoid reading/writing after closing an fd.
  • Use a rw lock to guard the file's node (allows concurrent size checks).
  • Improve flags and mode. Make file open options an extensible struct. Unix uses an int because it's easier to pass in a syscall and it has a fixed size. We don't care about that.
  • Fix MFS: Race when reading node and taking read/write lock #4514 (take the read/write lock before reading the file's node).
  • Avoid syncing/flushing when we don't need to.
  • Fixes Truncating an MFS file 175 times makes it unwritable. #4519: dagTruncate now preserves the Type of the node being truncated.

@schomatis
Copy link
Contributor Author

@Stebalien I added the fix for #4519 and now the tests are passing, now I have to resolve some merge conficts.

Regarding your commit WIP add test case., is that a work in progress (something more should be added to the tests)? Or is it OK as it is (now that the test are passing)? If the second, what should be its final commit message?

@Stebalien
Copy link
Member

🕺 Nice!

Regarding your commit WIP add test case., is that a work in progress (something more should be added to the tests)? Or is it OK as it is (now that the test are passing)? If the second, what should be its final commit message?

That was the test case that failed due to #4519. If possible, please move your commit before it and amend that commit message to remove the WIP and the "it fails" part.

@schomatis
Copy link
Contributor Author

Attempting to rebase this PR I found several merge conflicts, and even after resolving those there are still some modifications that need to be made to be able to build the program.

So if I rebase this I still need to add an extra commit (or sneak in several extra modifications in the last resolution of the rebase), and the commits of this branch will be unified, meaning that a roll back to a commit of the middle of this series won't be possible (it won't build). I think it will be cleaner to do a merge here, fixing everything in one extra commit, preserving your original commits and making clear that this series of commits belong to a different time frame of the code base. Another option could be to cherry pick one commit at a time into master and tidy them up so they all work.

@Stebalien Could you guide me through this, I'm not a git expert, WDYT?

@Stebalien
Copy link
Member

Another option could be to cherry pick one commit at a time into master and tidy them up so they all work.

This is usually how I do it. I can take a stab at it if you'd like.

@schomatis
Copy link
Contributor Author

@Stebalien Thanks, I'll give it a try myself and let you know if I run into trouble.

@schomatis schomatis closed this Mar 3, 2018
@schomatis
Copy link
Contributor Author

The PR was automatically closed when the branch got emptied.

@schomatis schomatis reopened this Mar 3, 2018
* Ensure we don't flush unnecessarily.
* Make it easier to work with modes/flags.
* Better handle closed files.

License: MIT
Signed-off-by: Steven Allen <[email protected]>
@schomatis
Copy link
Contributor Author

@Stebalien Regarding the first commit of the series (ensure we don't flush unnecessarily), while running the full tests on it (before I was just checking the problematic TestTruncateAndWrite and TestConcurrentWrites), I've noticed the TestFlushing hangs indefinitely.

This hang happens when flushing a file with the FlushPath function, particularly in the function WaitPub, because the Republisher never receives the Publish signal so it doesn't probe pubnowch (that WaitPub is waiting to write on).

The Update function that signals the Publish channel is not called because (correctly) the modified function flushUp doesn't call closeChild (that will eventually trigger the Update), the test file being flushed wasn't modified, so according to the new logic there is nothing to do.

The problem is that WaitPub finds an inconsistency, a CID that hasn't been published, which is actually the only CID which was set (setVal) during the test (because up to now there wasn't any modification of content), the MFS root.

In my (very limited) understanding the issue isn't the modification to flushUp (it makes sense to avoid a flush when there isn't anything to flush), nor the hang in WaitPub (although it is a bit suspicious that the flush operation has to wait for this publish, I'm not familiar with the publishing system) but the fact that the MFS root is marked to be published but it isn't until another (different) file (node) is flushed, handling it in an indirect fashion.

@Kubuxu Kubuxu added the status/WIP This a Work In Progress label Mar 9, 2018
@Stebalien
Copy link
Member

Stebalien commented Mar 21, 2018

although it is a bit suspicious that the flush operation has to wait for this publish, I'm not familiar with the publishing system

That's probably because flush is called by the user and, after flushing, the user expects updates to be published.

the fact that the MFS root is marked to be published but it isn't until another (different) file (node) is flushed, handling it in an indirect fashion.

I'm pretty sure your fix is correct. We should either not be calling setVal or should be setting lastpub as well. Calls to setVal should always be paired with setting the Publish flag.

(And sorry for being so slow to respond. Please repeatedly poke me on IRC when I don't respond in a timely manner; the more annoying the better.)

…a publish

Otherwise, WaitPub waits forever.

License: MIT
Signed-off-by: Steven Allen <[email protected]>
License: MIT
Signed-off-by: Steven Allen <[email protected]>
@schomatis schomatis requested a review from Kubuxu as a code owner March 22, 2018 20:49
@ghost ghost assigned Stebalien Mar 22, 2018
@ghost ghost added the status/in-progress In progress label Mar 22, 2018
1. DO NOT USE GOTOS FOR LOOPS.
2. Use a single timer.
3. Reuse the timer (avoids allocating and throwing away a bunch of timer
channels).

License: MIT
Signed-off-by: Steven Allen <[email protected]>
If we flip-flop between two values, we'll trigger a useless publish.

Also, fix the TestRepublisher test case:

1. Use actual, independent CIDs.
2. Don't assume that republish on close if nothing changes.

License: MIT
Signed-off-by: Steven Allen <[email protected]>
Before, WaitPub could notice that there is something to publish, sleep, then
wake up and write to pubnowch *after* the value is published. This would cause
it to hang until the next publish.

Instead, *always* write to pubnowch and check to see if we *really* need to
publish inside the `publish` function.

License: MIT
Signed-off-by: Steven Allen <[email protected]>
@Stebalien
Copy link
Member

@schomatis I've fixed it by adding a Set method for setting the both the current value and the last published value. IMO, this makes the most sense.

I've also:

  • Tried to make the republish timer logic faster and easier to follow (hopefully).
  • Made WaitPub not racy. Instead of checking to see if there is anything to publish, it just writes to the pubnowch channel and I deduplicate publish operations inside the publish loop.

@whyrusleeping significant change: We no longer publish one final time on close if nothing has been changed. IMO, this is correct but it's still a change (and forced me to change a test).

@Stebalien
Copy link
Member

This appears to cause a bug in the MFS sharness tests. It looks like there's a case where we should be flushing but aren't.

@schomatis
Copy link
Contributor Author

@Stebalien When opening a file its state is not initialized and hence left at stateFlushed (I think this should be an explicit initialization even if the value is left at iota). The t0251-files-flushing.sh is failing because ipfs files cp flushes a copied (created but not modified) file set to stateFlushed which does nothing. Maybe the default state value of a file should be revised, stateDirty seems like the safest choice but I'm not sure that's correct either.

I don't know what's causing the errors in t0250-files-api.sh.

@Stebalien
Copy link
Member

Hm. I'm not sure I understand. ipfs files cp doesn't appear to call Open on the target file, as far as I can tell.

It looks like ipfs files cp is calling PutNode which adds the file with directory.AddChild. That, in turn, appears to add the child to the directory by calling dirbuilder.AddChild (no open, flush, etc). What am I missing here?

@schomatis
Copy link
Contributor Author

@Stebalien The Open happens during the FlushPath call (there is an undocumented flush option set to true) which ends up in a file Flush.

0  0x000000000120e78b in github.com/ipfs/go-ipfs/mfs.(*File).Open
   at ./mfs/file.go:51
1  0x000000000120f53d in github.com/ipfs/go-ipfs/mfs.(*File).Flush
   at ./mfs/file.go:126
2  0x0000000001211f52 in github.com/ipfs/go-ipfs/mfs.FlushPath
   at ./mfs/ops.go:218
3  0x0000000001542891 in github.com/ipfs/go-ipfs/core/commands.glob..func57
   at ./core/commands/files.go:343
4  0x0000000001372cfd in github.com/ipfs/go-ipfs/commands/legacy.NewCommand.func1
   at ./commands/legacy/command.go:38

@Stebalien
Copy link
Member

@schomatis do you recall what's blocking this? It has a bunch of bug fixes and performance improvements.

@schomatis
Copy link
Contributor Author

The flush error discussed above.

@schomatis
Copy link
Contributor Author

This code has been moved to go-mfs, what can be extracted from this PR will be continued in ipfs/go-mfs#32.

@schomatis schomatis closed this Dec 13, 2018
@ghost ghost removed the status/in-progress In progress label Dec 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/WIP This a Work In Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Truncating an MFS file 175 times makes it unwritable. MFS: Race when reading node and taking read/write lock
3 participants