-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to output storage layout info for a contract #4017
Conversation
StorageInfo generates a JSON array with the information of all state variables in the contract, given its compiler instance. The format of each variable object in the output is: { "name": "myvar", "contract": "Test", "offset": "0", "slot": "0", "type": "uint256", "size": "1", "bytes": "32" } Requires the contract to have been compiled in order to collect this information.
- Add storage-layout to combined-json - Add storage option to json interface
I'm getting an error in CircleCI seemingly unrelated to the changes I introduced. Any ideas on how the changes made could relate to this failure?
|
Thanks for the pull request! I know this has been an often requested feature, but I have to ask the question I ask everyone talking about this: How does this extend to more complex data types? Also concerning the use-case in zeppelin os: Wouldn't it be safer to enforce all contracts to inherit from a certain contract that only consists of the required state variables? |
Thanks for the reply @chriseth!
Structs are listed by their full name Mappings are listed as using a single slot, but using their canonical name: I'll be happy to add tests for more complex data types, just let me know what you have in mind!
Yep, that's the approach we suggest, but if the user wants to start from scratch or modify an existing contract, we want to be able to validate that the result is compatible. Another potential issue: what happens if, as part of the upgrade, the user adds another contract to the inheritance chain (for instance, adds a Pausable behaviour), that would inadvertently add another variable in the middle of the storage layout. Last but not least, we don't want to rule out a future compiler optimization that rearranges the storage to be able to better package smaller-than-256 variables (such as the scenario described here). Should that happen, the same Solidity file could yield different storage layouts depending on the optimizations used during compilation. In other words, we want to have every check in place to catch any potential issues that may arise from storage layout incompatibilities. |
Actually, I've just noticed that structs actually hide how their inner members are laid out in the storage. I'll update this PR so that info is included (most likely as a nested |
The problem with more complex data structures is that you could have mappings taking strings to structs and those structs contains arrays with mappings and so on. |
If this is accepted, I think it should include the storage layout in the metadata too. |
@@ -520,6 +520,8 @@ Json::Value StandardCompiler::compileInternal(Json::Value const& _input) | |||
contractData["userdoc"] = m_compilerStack.natspecUser(contractName); | |||
if (isArtifactRequested(outputSelection, file, name, "devdoc")) | |||
contractData["devdoc"] = m_compilerStack.natspecDev(contractName); | |||
if (isArtifactRequested(outputSelection, file, name, "storage")) | |||
contractData["storage"] = m_compilerStack.storageInfo(contractName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and the key should be named storageLayoutMap
os something similar.
@@ -932,6 +934,8 @@ void CommandLineInterface::handleCombinedJSON() | |||
contractData[g_strNatspecDev] = dev::jsonCompactPrint(m_compiler->natspecDev(contractName)); | |||
if (requests.count(g_strNatspecUser)) | |||
contractData[g_strNatspecUser] = dev::jsonCompactPrint(m_compiler->natspecUser(contractName)); | |||
if (requests.count(g_strStorageInfo)) | |||
contractData[g_strStorageInfo] = dev::jsonCompactPrint(m_compiler->storageInfo(contractName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd hope not to support it on the commandline, rather expect people to use it through standard json IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I remove it from the command line? It only affects the combined-json.
variable["offset"] = to_string(location.second); | ||
variable["type"] = decl->type()->canonicalName(); | ||
variable["size"] = decl->type()->storageSize().str(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should contain the storage key as well for ease of use.
Btw, cannot the same be achieved through processing the AST JSON? |
I kind of see the point here that you are afraid of unexpected changes in the storage layout, but I'm not sure if this is the right means to protect against them. |
Only if we assume that the compiler will preserve in storage the order in which the variables were declared. We'd also need to replicate the inheritance linearization algorithm, since this would determine the order in which each contract is visited for setting up the storage. Overall, I'd say that storage layout depends on codegen, so the AST would not be the best place to retrieve this info. |
I think that guarding against such changes should be responsibility of an external tool (which would handle upgradeability overall), but such tool would need input from the compiler. That said, I'm open to other ideas on how to retrieve this information; nevertheless, this approach is quite non-invasive, as it does not require changes on the compiler itself, but merely exposes information already available. |
In general, I think this is a good change. The main negative point I have about this PR is that it adds "interface burden", we need to document it and be careful about breaking changes in this new interface. The other downside is that it does not go all the way, it does not specify how mappings are encoded, arrays etc., and this is why I think that it should not be part of a static output of the compiler. |
Thanks for the reply, @chriseth
Agree, but this is also the main reason behind the PR: catching a breaking change that could otherwise go unnoticed and have dire consequences.
If you are referring to how storage positions are determined (ie hash of slot and key), then I agree that the info is missing, and I'm not sure how it would be feasible to expose it. On the other hand, if you are referring to outputting information on the storage layout of the values handled by arrays/mappings (such as having a |
Hm, actually preventing breaking changes in this area should rather be done by adding a test against that. |
Created #4049 |
I've updated the implementation to correctly handle complex types (structs, mappings, and arrays) by recursively visiting their members. As an example, for a mapping: contract test {
mapping(uint => uint128) m;
} [
{
"name": "m",
"contract": "test",
"offset": "0",
"slot": "0",
"type": {
"name": "mapping(uint256 => uint128)",
"size": "1",
"bytes": "32",
"key": {
"name": "uint256",
"size": "1",
"bytes": "32"
},
"value": {
"name": "uint128",
"size": "1",
"bytes": "16"
}
}
}
] Note that all type-related info is now moved into a nested contract test {
uint a;
} [
{
"name": "a",
"contract": "test",
"offset": "0",
"slot": "0",
"type": {
"name": "uint256",
"size": "1",
"bytes": "32"
}
}
] |
So to recap the situation here. This feature request has apparently come up a few times already (#1236, #3736), because when one tries to work with Solidity at a lower level (e.g. building static analysis tools) there is often the need to know the position in storage of state variables. It's possible to do this using the AST by C3-linearizing the inheritance graph, but some of us think this is not robust because there is a significant possibility of error and incompatibility with future changes in the compiler. The compiler already has this knowledge, so we are proposing to expose that. I see two arguments against merging this.
@chriseth @axic What do you think about this? We would love to help get this done and maintain the feature going forward. 🙂 |
I'm fine with going forward with this, but we need a more detailed discussion about the format. Furthermore, tests are required that detect if there is a mismatch between the reported layout and the actual layout of the contract. Not sure how to do that, and perhaps we can also drop it if it is not possible, as the information is extracted directly from the relevant structures. What do you think, @axic ? |
@chriseth happy to discuss the JSON format anytime! I'm in the solidity gitter, feel free to ping me there and we can go through it. As for the tests, I think we could build something where we compare Solidity-based setters with the output of |
@@ -141,6 +141,8 @@ class CompilerContext | |||
unsigned currentToBaseStackOffset(unsigned _offset) const; | |||
/// @returns pair of slot and byte offset of the value inside this slot. | |||
std::pair<u256, unsigned> storageLocationOfVariable(Declaration const& _declaration) const; | |||
/// @returns all state variables along with their storage slot and byte offset of the value inside the slot. | |||
std::map<Declaration const*, std::pair<u256, unsigned>> const& stateVariables() const { return m_stateVariables; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to go through ContractType instead of the compiler context.
|
||
// Common type info | ||
data["name"] = type->canonicalName(); | ||
data["size"] = type->storageSize().str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps better to call this storageSlots
or numberOfSlots
?
data["base"] = processType(array->baseType()); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs assertion that there is no other category.
} | ||
else if (type->category() == Type::Category::Mapping) { | ||
auto map = static_pointer_cast<const MappingType>(type); | ||
data["key"] = processType(map->keyType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key type should be processed differently
As discussed on the call, it would make sense to:
Please also create a comprehensive documentation chapter - see the Standard JSON I/O chapter. |
data["value"] = processType(map->valueType()); | ||
} | ||
else if (type->category() == Type::Category::Array) { | ||
auto array = static_pointer_cast<const ArrayType>(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big difference between statically-sized and dynamically-sized arrays
Also discussed:
Regarding the file format, it was discussed to generate a flat structure, where {
"hash_version": "1",
"storage": [{
"label": "m",
"ast_id": 42,
"contract": "test",
"offset": "0",
"slot": "0",
"type": "mapping(uint256 => uint128)_identifier"
}],
"types": {
"mapping(uint256 => uint128)_identifier": {
"label": "mapping(uint256 => uint128)",
"kind": "mapping",
"size_in_slots": "1",
"bytes": "32",
"value_type": "uint128_identifier",
},
"uint128_identifier": {
"label": "uint128",
"kind": "scalar",
"size_in_slots": "1",
"bytes": "16"
}
}
} Thanks for all the input, folks!! |
@spalladino could you please inform us about the state of this PR? |
@chriseth currently on hold unfortunately, due to lack of availability on our end. We have implemented a workaround by inferring the storage layout from the AST in the meantime (which is not as reliable, but catches most cases). I plan on getting back at it a few weeks after devcon. That said, I'm ok if you want to close the PR and I open a new one once I have the new implementation we had agreed upon. |
We can keep it open for now. |
I'd like to add that this feature is very important tooling for formal techniques that want to reason about EVM bytecode generated from Solidity. Most people want to read/write specifications that talk about solidity storage variables, instead of the storage locations themselves, meaning that there needs to be a robust process of translating from one to the other. Right now we have no choice but to manually map out the storage layout of the contract by reading the bytecode, which is tedious, error-prone, and can be broken by updates to |
This functionality would be super useful in writing efficient Ethereum data cache solutions! A clear map of storage locations would make it way simpler for us to update cache on new blocks, and possibly make it automatic given a contract ABI 👌 |
Closing this for now. Please reopen (or create a new pull request) once you start again working on this. |
@spalladino any chance this might be continued? 🙏 |
@MrChico would love to, but haven't had the time to work on it for a long time already. I know @nanexcool was also interested, maybe someone from his team can tackle it...? |
This PR #7589 adds a basic version of this feature. It outputs the first layer of the storage layout, that is, no nested information. |
See also the discussion on #597. |
This PR adds new options to the solc command line interface and standard json interface to output information on the storage layout of a compiled contract in JSON format.
I'd really appreciate any feedback!
Fixes #3736.
Use case
In zeppelin_os we are working on upgradeability patterns using proxies. To make a long story short, all contract instances are actually proxies that hold the state and
DELEGATECALL
into a single contract acting as a backing implementation. To upgrade the proxy, we simply change the address of the backing implementation to a different one.However, for two implementations to be "compatible", they must share the same storage layout (ie the state variables must have been assigned the same slots in both implementations). Otherwise, when the new implementation attempts to read a storage variable set from the previous one, it may end up reading an incorrect value.
Even though it is possible to infer the position of each variable by manually linearizing the inheritance chain, and assuming that the compiler will honor the ordering in which the state vars were declared, this approach is brittle and could break, should the compiler start optimizing storage layouts (for instance, in order to pack small-sized variables in a single slot). Having this information provided directly by the compiler is a much safer approach.
Implementation
The information is already collected in
CompilerContext#m_stateVariables
. This PR just adds a new classStorageInfo
that reads that information and yields a JSON array with all the relevant information:Note that this involves exposing the internal
m_stateVariables
field ofCompilerContext
, of typestd::map<Declaration const*, std::pair<u256, unsigned>>
, which could be considered as breaking encapsulation. If so, this could be abstracted by exposing an iterator that yields a struct containing theVariableDeclaration
,position
, andoffset
, that provides the same information as the map, without revealing the underlying implementation. Not sure if it would be overdesigning, so I left it as simple as possible for now.Interface
Adds a new
storage-layout
argument to the combined-json option of the command-line interface:The storage-layout info is packed as a string in the combined json, same as the ABI, natspec, or other similar attributes.
As for the JSON interface, adds a
storage
outputSelection setting: