Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocks are only sanity checked when debug mode is active #1152

Open
BrendanBenshoof opened this issue Apr 28, 2015 · 24 comments
Open

Blocks are only sanity checked when debug mode is active #1152

BrendanBenshoof opened this issue Apr 28, 2015 · 24 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/blockstore Topic blockstore

Comments

@BrendanBenshoof
Copy link

https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/blocks/blocks.go#L27

This should be checked even if debug is not enabled. Otherwise malicious blocks and merkledag structures could be loaded.

@jbenet
Copy link
Member

jbenet commented Apr 28, 2015

I tend to agree.

we moved this to speed things up, making the point that if we cant trust our own local repo we're in trouble already. But i think i'm somewhere in the middle-- definitely know cases where the tradeoff of having a node operate as fast as possible making assumptions is fine...

Then again, I do think that:

ipfs cat <hash>

should never return data that does not hash to <hash>, which an adversary can do by manipulating the local block cache. (hard decision/tradeoff. prob app dependent.)

@BrendanBenshoof
Copy link
Author

Now that I have really dug into what is happening, I agree it is less of an issue than what I have made it out to be.

(please correct me if I have the following wrong)

Bitswap advertises a desire for a hash
other nodes send blocks without identifiers, bitswap saves them without any sanity checking using their hash as their name (computed locally).
the ipfs daemon (not sure which part) notices the block in the filesystem by name and stages and outputs it to the user. The sanity check I references happens when the block is loaded from the filesystem and staged. So unless the content is changed on the hard disk after the block is stored by bitswap, then the block is guaranteed to be correct.

There is no point in trying to detect/defend against changes on hard disk level because this is not part of ipfs's adversary model (which I have not seen spelled out and I might be able to help with that)

If the performance hit is tolerable, make it permanent. (or really, like most performance/security tradeoffs, let the user decide via config file)

@jbenet
Copy link
Member

jbenet commented Apr 28, 2015

bitswap saves them without any sanity checking using their hash as their name

that's incorrect (or should be). all blocks MUST be checked before saving.

So unless the content is changed on the hard disk after the block is stored by bitswap, then the block is guaranteed to be correct.

right, this is the case (though this contradicts the statement above)

There is no point in trying to detect/defend against changes on hard disk level because this is not part of ipfs's adversary model

no-- actually, some nodes will have to do this. but not all. ipfs is not a domain specific protocol, it's a very general thing where different use cases demand sitting at different points of the security x perf tradeoff.

If the performance hit is tolerable, make it permanent. (or really, like most performance/security tradeoffs, let the user decide via config file)

right.

@BrendanBenshoof
Copy link
Author

So here is the code that does the "sanity check" for bitswap. Essentially it takes in the given blocks, hashes them and looks if those hashes are on the wantlist and passes them on. It seems to discard the unwanted blocks (is this desired behavior? do we want people to be able to broadcast arbitrary blocks to proliferate them?)
https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/exchange/bitswap/decision/engine.go#L202

I'm interested in helping with building a real adversary model because it actively overlaps with my research (We can handle 'variable' levels of required security by breaking it down by use cases) It would help in communicating "what benefit to security does IPFS offer to me?" in explicit terms.

Wrapping up: A fix to this issue would include:

@whyrusleeping
Copy link
Member

@BrendanBenshoof throwing away blocks that we dont want is desired behaviour. You should only have blocks on your system if they are explicitly requested.

Also, i don't think that we should validate blocks upon reading from the disk. If an adversary has access to your disk, you have much bigger problems than whether or not that particular block is correct.

@jbenet
Copy link
Member

jbenet commented Apr 29, 2015

I'm interested in helping with building a real adversary model because it actively overlaps with my research (We can handle 'variable' levels of required security by breaking it down by use cases) It would help in communicating "what benefit to security does IPFS offer to me?" in explicit terms.

yeah though the model should be made in relation to where we're headed, not where we are today. (of course need to fix impl - model divergence)

  • adding a configuration variable to track if blocks pulled from disk should be checked.

yep.

@whyrusleeping

Also, i don't think that we should validate blocks upon reading from the disk. If an adversary has access to your disk, you have much bigger problems than whether or not that particular block is correct.

not in all threat models. we want some nodes to be able to validate blocks they pull in from their repos. not all repos are pulled from local disk, and not all local disks are the same. process + disk are very different security arenas, and one may be at much higher risk than the other, particularly with networked filesystems.

@daviddias daviddias added kind/bug A bug in existing code (including security flaws) topic/bitswap Topic bitswap labels Jan 2, 2016
@whyrusleeping
Copy link
Member

Blocks received from the network are always checked. Blocks read from disk are not checked by default, but if you set ipfs config Datastore.HashOnRead true they will be.

@jbenet
Copy link
Member

jbenet commented Aug 23, 2016

@whyrusleeping is that captured in documentation anywhere? it should also be captured in some security notes. Poisoning or corrupting repos could be an attack vector.

@jbenet
Copy link
Member

jbenet commented Aug 23, 2016

  • I'd also err on the side of turning it on by default and letting people turn it off for speed. there could be a list of "knobs to turn for perf" document.

cc @RichardLitt as he handled various documentation things for go-ipfs

@RichardLitt
Copy link
Member

Let's reopen this and assign it to me as a thing to add to the security docs.

@RichardLitt RichardLitt self-assigned this Aug 26, 2016
@RichardLitt RichardLitt reopened this Aug 26, 2016
@Kubuxu
Copy link
Member

Kubuxu commented Aug 26, 2016

I disagree with rehashing on read being by default on, there is a big con for it that in my opinion outweigh possible benefits:
increased read cost: in CPU time, latency and power usage

Possible pros are: disk error detection and attack prevention, but in both of those cases IPFS isn't primary risk factor.

When disk is dying so badly that it reads bad blocks user would notice from other instabilities, he for sure won't read IPFS's console logs because those users notice performance loss and change the disk long time ago. In case of using IPFS as an attack vector: user is screwed up in so many other ways when malware or 3rd person has access to user's file system.

For contrast: git by default doesn't even do object hashing while fetching from the network.

@kevina
Copy link
Contributor

kevina commented Aug 26, 2016

I agree with @Kubuxu on this one.

@BrendanBenshoof
Copy link
Author

Reasoning A:
Like a lot of security measures, this is not actually about protecting the
user, it is about informing them. Jakub is right on all counts, but if IPFS
can detect an issue still should inform the user it thinks something is
wrong.

Reasoning B:
Part of the core idea of IPFS is abstraction of data sources. Why should
what appears to be the File System be more trusted than the network? It
could be a network shared drive, or some other abstraction (like fuse) that
is just as vulnerable as the network but masquerades as the file system.

On Fri, Aug 26, 2016, 12:13 Kevin Atkinson [email protected] wrote:

I agree with @Kubuxu https://github.com/Kubuxu on this one.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1152 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACrcFdKbjq8Id_DLoP8TAgIsAfr-NGn-ks5qjzrRgaJpZM4EKO5q
.

@jbenet
Copy link
Member

jbenet commented Aug 29, 2016

Well put, @BrendanBenshoof.

I'm on Reasoning B all the way through. If the user wants to decrease security to increase performance, that's their prerogative and we can give them a knob to turn. But we should not put the user in a weaker spot from the beginning.

The file system is definitely an attack vector. And it cannot be trusted as much as main memory, or the cpu registers. As @BrendanBenshoof, the filesystem may be coming from the network as well, or may not be under our full control. This will only get to be the case more as we add support for independent auth agents (so your keys can be stored more securely), or "storing" blocks in other parts of the filesystem (the source files themselves, as @kevina is writing).

We have guides and readmes, and config docs. between:

  • (a) a line saying "turn this on for more security, but less perf"
  • (b) a line saying "turn this off for more perf, but less security"

I much prefer (b).

@RichardLitt
Copy link
Member

I am unlikely to get to this soon. I would suggest unassigning this and labeling at as 'help wanted'. Note: posting this may automatically remove my assignment.

@whyrusleeping
Copy link
Member

Haha, thanks @RichardLitt

@kevina kevina self-assigned this Jul 10, 2018
@kevina
Copy link
Contributor

kevina commented Jul 10, 2018

I need to investigate this. Thus bug is two years old and I think some things might have changed.

@Kubuxu
Copy link
Member

Kubuxu commented Jul 10, 2018

It is still the same, HashOnRead was introduced a while ago and it was mentioned in this discussion.

olizilla added a commit that referenced this issue Jun 4, 2019
This is currently the case. It should be filpped to true, but in the meantime let's document the current situation

See: #1152
@olizilla
Copy link
Member

olizilla commented Jun 4, 2019

HashOnRead is available, but defaults to false. Added it to the docs #6401

We should evaluate the preformance hit, and if it's not crushing, enable HashOnRead by default.

@Stebalien
Copy link
Member

Stebalien commented Jun 4, 2019

Hashing tends to be pretty expensive but we'd have to do some profiling to be sure. Ideally, we'd be at a level of performance where re-hashing would kill performance but I'm pretty sure we aren't there yet.

Note: corrupting a local datastore isn't really a reasonable attack vector. The local disk should be more trusted than the network because we already trust that the go-ipfs binary isn't corrupted, the user's bashrc doesn't start any malware, etc.

On the other hand, corrupting S3, etc., could be an issue. I wonder if we should flag different datastores as trusted/untrusted? I'm worried this adds too much complexity.

@Kubuxu
Copy link
Member

Kubuxu commented Jun 4, 2019

I've evaluated the performance hit a long time ago. It was high enough that enabling it by default wasn't really an option but it is a good idea to reevaluate.

@Stebalien
Copy link
Member

IMO, we should re-evaluate and, if it doesn't have a performance impact, figure out why and fix the performance issue.

@Kubuxu
Copy link
Member

Kubuxu commented Jun 4, 2019

The performance impact was just the hashing itself. Native sha256 on my laptop on a single core maxes out 300MiB/s, so reads will have proportional overhead to that.

If someone reads 100MiB/s then it will be 33% CPU spent on hashing.

@Stebalien Stebalien added topic/blockstore Topic blockstore kind/enhancement A net-new feature or improvement to an existing feature and removed topic/bitswap Topic bitswap kind/bug A bug in existing code (including security flaws) labels Feb 12, 2020
@Jorropo
Copy link
Contributor

Jorropo commented Jul 27, 2023

This is ancient, modern releases of go can hash at ~2GiB/s per core on all modern amd64 cores (thx to new shani accelerated crypto), we also used a third party sha256impl for a while which did this already.
With filesystems that are able to sanity checks drives such as ZFS and BTRFS I don't belive there is lots of value with something like hashOnRead being on by default, having it at the FS level allows for better caching and avoiding to rehash things in the block cache of the OS.
I'll close this in the future unless someone steps up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/blockstore Topic blockstore
Projects
None yet
Development

No branches or pull requests

10 participants