Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix breakage: t0080-repo refs local bug #994

Closed
jbenet opened this issue Mar 31, 2015 · 8 comments
Closed

Fix breakage: t0080-repo refs local bug #994

jbenet opened this issue Mar 31, 2015 · 8 comments
Labels
kind/bug A bug in existing code (including security flaws) status/in-progress In progress

Comments

@jbenet
Copy link
Member

jbenet commented Mar 31, 2015

See #992 (comment)

Looks to be a bug in syndtr/leveldb given https://gist.github.com/jbenet/adf77b142a78451d1f91

@whyrusleeping
Copy link
Member

in your instrumentation you should try and break out the ipfs key and print it so we can see the QmHashes

@jbenet
Copy link
Member Author

jbenet commented Mar 31, 2015

@whyrusleeping it's there-- oh no it isn't. okay, maybe it is our bug after all. le sigh

@jbenet jbenet mentioned this issue Mar 31, 2015
52 tasks
@chriscool
Copy link
Contributor

Maybe the breakage in PR #933 is related?

@whyrusleeping whyrusleeping added the kind/bug A bug in existing code (including security flaws) label Apr 4, 2015
@jbenet jbenet mentioned this issue Apr 6, 2015
42 tasks
@whyrusleeping
Copy link
Member

this issue is caused by the path.Clean bug in go-datastore. Any key that ends in a '/' character will be mangled. This is a fairly major bug honestly

@chriscool
Copy link
Contributor

So what can we do about that? Is there already an opened go-datastore issue about it?

@whyrusleeping
Copy link
Member

I thought there was, but I don't see one. I think that we shouldnt be using the go-datastore, and instead should be using something better tailored to a content addressed file store.
Go-datastore expects 'path-like' keys, and thats something that we cant efficiently do (unless we want to marshal every key on stores and lookups and queries.

@Cleric-K
Copy link

@whyrusleeping @jbenet @chriscool
Is this issue being addressed?

To sum up - problems arise when the binary hash contains ascii symbols /, . which are interpreted by path.Clean(). This malforms the key. Here are few examples:

  1. Slash at the end. Test with $ echo 10 | ipfs add
  2. Two or more slashes (get converted to one slash). $ echo 00000259 | ipfs add
  3. /../ - removes the part before that token. $ echo 0243397916 | ipfs add. Creates file:
    d83220a26feced8c55f3ede2819a7f0a84ce.data
    instead of:
    12200bf25b81d5a2ab229b652f2e2e2fd83220a26feced8c55f3ede2819a7f0a84ce.data

Any of these files can never be properly garbage collected because a proper key can not be reconstructed from the file name.
The first case (slash at the end) is pretty common - roughly one in 256 - so that will be considerable number of files which can not be reclaimed.
Also these objects are not reported by ipfs refs local.

Since migrating to a new datastore is quite an ambitious task, here is a temporary solution. It has to do with customizing the ipfs/go-datastore/key.go Clean() function. The idea is simple:

  1. The key looks like /path1/<binary_hash>////path2/path3/<binary_hash2>. The function scans the key and tries to detect multihash in between two slashes or slash in front and end-of-string at the end.
  2. The binary hash is converted to its hex form. In the same time a map is created which maps hex -> binary. A new string is created (using bytes Buffer) which looks exactly as the original but with binary hashes replaced with hex.
  3. path.Clean() is applied.
  4. Then using the map, all hex hashes are replaced with their binary form.
func (k *Key) Clean() {
    // k.string = path.Clean("/" + k.string)
    s := "/" + k.string
    lk := len(s)

    bb := bytes.NewBuffer(make([]byte, 0, lk*2))
    lastb := 0 // last index saved to buffer
    kmap := make(map [string]string)

    for i := 1; i < lk - 1; i++ {
        if s[i-1] != '/' || !mh.ValidCode(int(s[i])) {
            continue
        }

        mhend := i + 2 + int(s[i+1]) // end index of hash 
        if mhend > lk {
            continue
        }

        if mhend == lk || s[mhend] == '/' {
            // end-of-string or has slash at the end
            rawh := s[i:mhend]
            hexh := hex.EncodeToString([]byte(rawh))
            kmap[hexh] = rawh
            bb.WriteString(s[lastb:i]) // copy since last copy
            bb.WriteString(hexh)

            lastb = mhend
            i = mhend
        }
    }

    if lastb > 0 {
        if lastb < lk {
            bb.WriteString(s[lastb:lk])
        }

        // clean
        s = path.Clean(bb.String())

        // replace hex with raw hashes      
        for hexh, rawh := range kmap {
            s = strings.Replace(s, hexh, rawh, -1)
        }

    } else {
        // no hashes in key
        s = path.Clean(s)
    }

    k.string = s
}

I know it is not a pretty solution since go-datastore is more or less abstract and shouldn't even know that part of the keys might be multihashes, but at least the solution works and is just a bunch of lines of code.

Potential problems:

  • I'm not very familiar with the whole ipfs source but it might be possible that keys are being split with / delimiter and expecting to find the hash at specific segment, but if the hash contains / that will break it apart (but this is also problem now, it is not specific to this solution).
  • if this function is used, the datastore won't be backwards compatible - current problematc keys will no longer look for the data at the malformed files).

@whyrusleeping
Copy link
Member

This has been fixed in #2903

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) status/in-progress In progress
Projects
None yet
Development

No branches or pull requests

5 participants