Blocks are only sanity checked when debug mode is active #1152

BrendanBenshoof · 2015-04-28T01:56:45Z

https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/blocks/blocks.go#L27

This should be checked even if debug is not enabled. Otherwise malicious blocks and merkledag structures could be loaded.

jbenet · 2015-04-28T06:09:41Z

I tend to agree.

we moved this to speed things up, making the point that if we cant trust our own local repo we're in trouble already. But i think i'm somewhere in the middle-- definitely know cases where the tradeoff of having a node operate as fast as possible making assumptions is fine...

Then again, I do think that:

ipfs cat <hash>

should never return data that does not hash to <hash>, which an adversary can do by manipulating the local block cache. (hard decision/tradeoff. prob app dependent.)

BrendanBenshoof · 2015-04-28T16:57:54Z

Now that I have really dug into what is happening, I agree it is less of an issue than what I have made it out to be.

(please correct me if I have the following wrong)

Bitswap advertises a desire for a hash
other nodes send blocks without identifiers, bitswap saves them without any sanity checking using their hash as their name (computed locally).
the ipfs daemon (not sure which part) notices the block in the filesystem by name and stages and outputs it to the user. The sanity check I references happens when the block is loaded from the filesystem and staged. So unless the content is changed on the hard disk after the block is stored by bitswap, then the block is guaranteed to be correct.

There is no point in trying to detect/defend against changes on hard disk level because this is not part of ipfs's adversary model (which I have not seen spelled out and I might be able to help with that)

If the performance hit is tolerable, make it permanent. (or really, like most performance/security tradeoffs, let the user decide via config file)

jbenet · 2015-04-28T23:48:14Z

bitswap saves them without any sanity checking using their hash as their name

that's incorrect (or should be). all blocks MUST be checked before saving.

So unless the content is changed on the hard disk after the block is stored by bitswap, then the block is guaranteed to be correct.

right, this is the case (though this contradicts the statement above)

There is no point in trying to detect/defend against changes on hard disk level because this is not part of ipfs's adversary model

no-- actually, some nodes will have to do this. but not all. ipfs is not a domain specific protocol, it's a very general thing where different use cases demand sitting at different points of the security x perf tradeoff.

If the performance hit is tolerable, make it permanent. (or really, like most performance/security tradeoffs, let the user decide via config file)

right.

BrendanBenshoof · 2015-04-29T04:45:21Z

So here is the code that does the "sanity check" for bitswap. Essentially it takes in the given blocks, hashes them and looks if those hashes are on the wantlist and passes them on. It seems to discard the unwanted blocks (is this desired behavior? do we want people to be able to broadcast arbitrary blocks to proliferate them?)
https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/exchange/bitswap/decision/engine.go#L202

I'm interested in helping with building a real adversary model because it actively overlaps with my research (We can handle 'variable' levels of required security by breaking it down by use cases) It would help in communicating "what benefit to security does IPFS offer to me?" in explicit terms.

Wrapping up: A fix to this issue would include:

adding a configuration variable to track if blocks pulled from disk should be checked.
Assigning a value to this variable using the IPFS configuration file.
Check this variable in addition to the debug variable in the if statement at line: https://github.com/ipfs/go-ipfs/blob/79360bbd32d8a0b9c5ab633b5a0461d9acd0f477/blocks/blocks.go#L27

whyrusleeping · 2015-04-29T05:06:57Z

@BrendanBenshoof throwing away blocks that we dont want is desired behaviour. You should only have blocks on your system if they are explicitly requested.

Also, i don't think that we should validate blocks upon reading from the disk. If an adversary has access to your disk, you have much bigger problems than whether or not that particular block is correct.

jbenet · 2015-04-29T07:38:19Z

I'm interested in helping with building a real adversary model because it actively overlaps with my research (We can handle 'variable' levels of required security by breaking it down by use cases) It would help in communicating "what benefit to security does IPFS offer to me?" in explicit terms.

yeah though the model should be made in relation to where we're headed, not where we are today. (of course need to fix impl - model divergence)

adding a configuration variable to track if blocks pulled from disk should be checked.

yep.

@whyrusleeping

Also, i don't think that we should validate blocks upon reading from the disk. If an adversary has access to your disk, you have much bigger problems than whether or not that particular block is correct.

not in all threat models. we want some nodes to be able to validate blocks they pull in from their repos. not all repos are pulled from local disk, and not all local disks are the same. process + disk are very different security arenas, and one may be at much higher risk than the other, particularly with networked filesystems.

whyrusleeping · 2016-08-23T21:45:23Z

Blocks received from the network are always checked. Blocks read from disk are not checked by default, but if you set ipfs config Datastore.HashOnRead true they will be.

jbenet · 2016-08-23T22:23:08Z

@whyrusleeping is that captured in documentation anywhere? it should also be captured in some security notes. Poisoning or corrupting repos could be an attack vector.

jbenet · 2016-08-23T22:24:01Z

I'd also err on the side of turning it on by default and letting people turn it off for speed. there could be a list of "knobs to turn for perf" document.

cc @RichardLitt as he handled various documentation things for go-ipfs

RichardLitt · 2016-08-26T14:54:10Z

Let's reopen this and assign it to me as a thing to add to the security docs.

Kubuxu · 2016-08-26T16:51:04Z

I disagree with rehashing on read being by default on, there is a big con for it that in my opinion outweigh possible benefits:
increased read cost: in CPU time, latency and power usage

Possible pros are: disk error detection and attack prevention, but in both of those cases IPFS isn't primary risk factor.

When disk is dying so badly that it reads bad blocks user would notice from other instabilities, he for sure won't read IPFS's console logs because those users notice performance loss and change the disk long time ago. In case of using IPFS as an attack vector: user is screwed up in so many other ways when malware or 3rd person has access to user's file system.

For contrast: git by default doesn't even do object hashing while fetching from the network.

kevina · 2016-08-26T19:13:15Z

I agree with @Kubuxu on this one.

BrendanBenshoof · 2016-08-26T19:58:37Z

Reasoning A:
Like a lot of security measures, this is not actually about protecting the
user, it is about informing them. Jakub is right on all counts, but if IPFS
can detect an issue still should inform the user it thinks something is
wrong.

Reasoning B:
Part of the core idea of IPFS is abstraction of data sources. Why should
what appears to be the File System be more trusted than the network? It
could be a network shared drive, or some other abstraction (like fuse) that
is just as vulnerable as the network but masquerades as the file system.

On Fri, Aug 26, 2016, 12:13 Kevin Atkinson [email protected] wrote:

I agree with @Kubuxu https://github.com/Kubuxu on this one.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1152 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACrcFdKbjq8Id_DLoP8TAgIsAfr-NGn-ks5qjzrRgaJpZM4EKO5q
.

jbenet · 2016-08-29T07:10:59Z

Well put, @BrendanBenshoof.

I'm on Reasoning B all the way through. If the user wants to decrease security to increase performance, that's their prerogative and we can give them a knob to turn. But we should not put the user in a weaker spot from the beginning.

The file system is definitely an attack vector. And it cannot be trusted as much as main memory, or the cpu registers. As @BrendanBenshoof, the filesystem may be coming from the network as well, or may not be under our full control. This will only get to be the case more as we add support for independent auth agents (so your keys can be stored more securely), or "storing" blocks in other parts of the filesystem (the source files themselves, as @kevina is writing).

We have guides and readmes, and config docs. between:

(a) a line saying "turn this on for more security, but less perf"
(b) a line saying "turn this off for more perf, but less security"

I much prefer (b).

RichardLitt · 2018-07-09T20:45:11Z

I am unlikely to get to this soon. I would suggest unassigning this and labeling at as 'help wanted'. Note: posting this may automatically remove my assignment.

whyrusleeping · 2018-07-10T08:24:54Z

Haha, thanks @RichardLitt

kevina · 2018-07-10T08:42:52Z

I need to investigate this. Thus bug is two years old and I think some things might have changed.

Kubuxu · 2018-07-10T08:46:05Z

It is still the same, HashOnRead was introduced a while ago and it was mentioned in this discussion.

This is currently the case. It should be filpped to true, but in the meantime let's document the current situation See: #1152

olizilla · 2019-06-04T15:19:43Z

HashOnRead is available, but defaults to false. Added it to the docs #6401

We should evaluate the preformance hit, and if it's not crushing, enable HashOnRead by default.

Stebalien · 2019-06-04T16:22:56Z

Hashing tends to be pretty expensive but we'd have to do some profiling to be sure. Ideally, we'd be at a level of performance where re-hashing would kill performance but I'm pretty sure we aren't there yet.

Note: corrupting a local datastore isn't really a reasonable attack vector. The local disk should be more trusted than the network because we already trust that the go-ipfs binary isn't corrupted, the user's bashrc doesn't start any malware, etc.

On the other hand, corrupting S3, etc., could be an issue. I wonder if we should flag different datastores as trusted/untrusted? I'm worried this adds too much complexity.

Kubuxu · 2019-06-04T16:24:29Z

I've evaluated the performance hit a long time ago. It was high enough that enabling it by default wasn't really an option but it is a good idea to reevaluate.

Stebalien · 2019-06-04T16:27:53Z

IMO, we should re-evaluate and, if it doesn't have a performance impact, figure out why and fix the performance issue.

Kubuxu · 2019-06-04T16:32:31Z

The performance impact was just the hashing itself. Native sha256 on my laptop on a single core maxes out 300MiB/s, so reads will have proportional overhead to that.

If someone reads 100MiB/s then it will be 33% CPU spent on hashing.

Jorropo · 2023-07-27T19:27:26Z

This is ancient, modern releases of go can hash at ~2GiB/s per core on all modern amd64 cores (thx to new shani accelerated crypto), we also used a third party sha256impl for a while which did this already.
With filesystems that are able to sanity checks drives such as ZFS and BTRFS I don't belive there is lots of value with something like hashOnRead being on by default, having it at the FS level allows for better caching and avoiding to rehash things in the block cache of the OS.
I'll close this in the future unless someone steps up.

daviddias added kind/bug A bug in existing code (including security flaws) topic/bitswap Topic bitswap labels Jan 2, 2016

whyrusleeping closed this as completed Aug 23, 2016

RichardLitt self-assigned this Aug 26, 2016

RichardLitt reopened this Aug 26, 2016

kevina mentioned this issue Oct 1, 2016

New setting: API/Repo Trust level #3278

Open

RichardLitt unassigned RichardLitt Jul 9, 2018

kevina self-assigned this Jul 10, 2018

olizilla added a commit that referenced this issue Jun 4, 2019

docs: default value for HashOnRead is false

c30aaa9

This is currently the case. It should be filpped to true, but in the meantime let's document the current situation See: #1152

olizilla mentioned this issue Jun 4, 2019

docs: default value for HashOnRead is false #6401

Merged

Stebalien unassigned kevina Feb 12, 2020

Stebalien added topic/blockstore Topic blockstore kind/enhancement A net-new feature or improvement to an existing feature and removed topic/bitswap Topic bitswap kind/bug A bug in existing code (including security flaws) labels Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocks are only sanity checked when debug mode is active #1152

Blocks are only sanity checked when debug mode is active #1152

BrendanBenshoof commented Apr 28, 2015

jbenet commented Apr 28, 2015

BrendanBenshoof commented Apr 28, 2015

jbenet commented Apr 28, 2015

BrendanBenshoof commented Apr 29, 2015

whyrusleeping commented Apr 29, 2015

jbenet commented Apr 29, 2015

whyrusleeping commented Aug 23, 2016

jbenet commented Aug 23, 2016

jbenet commented Aug 23, 2016

RichardLitt commented Aug 26, 2016

Kubuxu commented Aug 26, 2016 •

edited

Loading

kevina commented Aug 26, 2016

BrendanBenshoof commented Aug 26, 2016

jbenet commented Aug 29, 2016 •

edited

Loading

RichardLitt commented Jul 9, 2018

whyrusleeping commented Jul 10, 2018

kevina commented Jul 10, 2018

Kubuxu commented Jul 10, 2018

olizilla commented Jun 4, 2019

Stebalien commented Jun 4, 2019 •

edited

Loading

Kubuxu commented Jun 4, 2019

Stebalien commented Jun 4, 2019

Kubuxu commented Jun 4, 2019

Jorropo commented Jul 27, 2023

Blocks are only sanity checked when debug mode is active #1152

Blocks are only sanity checked when debug mode is active #1152

Comments

BrendanBenshoof commented Apr 28, 2015

jbenet commented Apr 28, 2015

BrendanBenshoof commented Apr 28, 2015

jbenet commented Apr 28, 2015

BrendanBenshoof commented Apr 29, 2015

whyrusleeping commented Apr 29, 2015

jbenet commented Apr 29, 2015

whyrusleeping commented Aug 23, 2016

jbenet commented Aug 23, 2016

jbenet commented Aug 23, 2016

RichardLitt commented Aug 26, 2016

Kubuxu commented Aug 26, 2016 • edited Loading

kevina commented Aug 26, 2016

BrendanBenshoof commented Aug 26, 2016

jbenet commented Aug 29, 2016 • edited Loading

RichardLitt commented Jul 9, 2018

whyrusleeping commented Jul 10, 2018

kevina commented Jul 10, 2018

Kubuxu commented Jul 10, 2018

olizilla commented Jun 4, 2019

Stebalien commented Jun 4, 2019 • edited Loading

Kubuxu commented Jun 4, 2019

Stebalien commented Jun 4, 2019

Kubuxu commented Jun 4, 2019

Jorropo commented Jul 27, 2023

Kubuxu commented Aug 26, 2016 •

edited

Loading

jbenet commented Aug 29, 2016 •

edited

Loading

Stebalien commented Jun 4, 2019 •

edited

Loading