Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to output storage layout info for a contract #4017

Closed
wants to merge 9 commits into from

Conversation

spalladino
Copy link

@spalladino spalladino commented Apr 29, 2018

This PR adds new options to the solc command line interface and standard json interface to output information on the storage layout of a compiled contract in JSON format.

I'd really appreciate any feedback!

Fixes #3736.

Use case

In zeppelin_os we are working on upgradeability patterns using proxies. To make a long story short, all contract instances are actually proxies that hold the state and DELEGATECALL into a single contract acting as a backing implementation. To upgrade the proxy, we simply change the address of the backing implementation to a different one.

However, for two implementations to be "compatible", they must share the same storage layout (ie the state variables must have been assigned the same slots in both implementations). Otherwise, when the new implementation attempts to read a storage variable set from the previous one, it may end up reading an incorrect value.

Even though it is possible to infer the position of each variable by manually linearizing the inheritance chain, and assuming that the compiler will honor the ordering in which the state vars were declared, this approach is brittle and could break, should the compiler start optimizing storage layouts (for instance, in order to pack small-sized variables in a single slot). Having this information provided directly by the compiler is a much safer approach.

Implementation

The information is already collected in CompilerContext#m_stateVariables. This PR just adds a new class StorageInfo that reads that information and yields a JSON array with all the relevant information:

[{
  "name": "myvar",
  "contract": "Test",
  "offset": "0",
  "slot": "0",
  "type": "uint256",
  "size": "1",
  "bytes": "32"
}]

Note that this involves exposing the internal m_stateVariables field of CompilerContext, of type std::map<Declaration const*, std::pair<u256, unsigned>>, which could be considered as breaking encapsulation. If so, this could be abstracted by exposing an iterator that yields a struct containing the VariableDeclaration, position, and offset, that provides the same information as the map, without revealing the underlying implementation. Not sure if it would be overdesigning, so I left it as simple as possible for now.

Interface

Adds a new storage-layout argument to the combined-json option of the command-line interface:

$ solc ~/Projects/zeppelin-solidity/contracts/DayLimit.sol --combined-json storage-layout  | jq .
{
  "contracts": {
    "/home/spalladino/Projects/zeppelin-solidity/contracts/DayLimit.sol:DayLimit": {
      "storage-layout": "[{\"bytes\":\"32\",\"contract\":\"DayLimit\",\"name\":\"dailyLimit\",\"offset\":\"0\",\"size\":\"1\",\"slot\":\"0\",\"type\":\"uint256\"},{\"bytes\":\"32\",\"contract\":\"DayLimit\",\"name\":\"spentToday\",\"offset\":\"0\",\"size\":\"1\",\"slot\":\"1\",\"type\":\"uint256\"},{\"bytes\":\"32\",\"contract\":\"DayLimit\",\"name\":\"lastDay\",\"offset\":\"0\",\"size\":\"1\",\"slot\":\"2\",\"type\":\"uint256\"}]"
    }
  },
  "version": "0.4.24-develop.2018.4.29+commit.cf59d217.mod.Linux.g++"
}

The storage-layout info is packed as a string in the combined json, same as the ABI, natspec, or other similar attributes.

As for the JSON interface, adds a storage outputSelection setting:

$ echo '{"language":"Solidity","sources":{"DayLimit.sol":{"urls":["/home/spalladino/Projects/zeppelin-solidity/contracts/DayLimit.sol"]}},"settings":{"outputSelection":{"*":{"*":["storage"]}}}}' | ./build/solc/solc --standard-json --allow-paths="/home/spalladino/Projects/zeppelin-solidity/contracts" | jq .
{
  "contracts": {
    "DayLimit.sol": {
      "DayLimit": {
        "evm": {},
        "storage": [
          {
            "bytes": "32",
            "contract": "DayLimit",
            "name": "dailyLimit",
            "offset": "0",
            "size": "1",
            "slot": "0",
            "type": "uint256"
          },
          {
            "bytes": "32",
            "contract": "DayLimit",
            "name": "spentToday",
            "offset": "0",
            "size": "1",
            "slot": "1",
            "type": "uint256"
          },
          {
            "bytes": "32",
            "contract": "DayLimit",
            "name": "lastDay",
            "offset": "0",
            "size": "1",
            "slot": "2",
            "type": "uint256"
          }
        ]
      }
    }
  },
  "errors": [ ... ],
  "sources": {
    "DayLimit.sol": {
      "id": 0
    }
  }
}

StorageInfo generates a JSON array with the information of all
state variables in the contract, given its compiler instance.

The format of each variable object in the output is:

{
  "name": "myvar",
  "contract": "Test",
  "offset": "0",
  "slot": "0",
  "type": "uint256",
  "size": "1",
  "bytes": "32"
}

Requires the contract to have been compiled in order to collect
this information.
- Add storage-layout to combined-json
- Add storage option to json interface
@spalladino
Copy link
Author

I'm getting an error in CircleCI seemingly unrelated to the changes I introduced. Any ideas on how the changes made could relate to this failure?

return_string - SolidityEndToEndTest
failure
ASSERTION FAILURE:
- file   : RPCSession.cpp
- line   : 162
- message: critical check !result.isNull() has failed

@chriseth
Copy link
Contributor

Thanks for the pull request! I know this has been an often requested feature, but I have to ask the question I ask everyone talking about this: How does this extend to more complex data types?

Also concerning the use-case in zeppelin os: Wouldn't it be safer to enforce all contracts to inherit from a certain contract that only consists of the required state variables?

@spalladino
Copy link
Author

Thanks for the reply @chriseth!

How does this extend to more complex data types?

Structs are listed by their full name contractName.structName:
https://github.com/spalladino/solidity/blob/7dc45d82d4b0c746145168da020df37a935ab066/test/libsolidity/SolidityStorageInfoJson.cpp#L201-L218

Mappings are listed as using a single slot, but using their canonical name:
https://github.com/spalladino/solidity/blob/7dc45d82d4b0c746145168da020df37a935ab066/test/libsolidity/SolidityStorageInfoJson.cpp#L186-L199

I'll be happy to add tests for more complex data types, just let me know what you have in mind!

Wouldn't it be safer to enforce all contracts to inherit from a certain contract that only consists of the required state variables?

Yep, that's the approach we suggest, but if the user wants to start from scratch or modify an existing contract, we want to be able to validate that the result is compatible. Another potential issue: what happens if, as part of the upgrade, the user adds another contract to the inheritance chain (for instance, adds a Pausable behaviour), that would inadvertently add another variable in the middle of the storage layout.

Last but not least, we don't want to rule out a future compiler optimization that rearranges the storage to be able to better package smaller-than-256 variables (such as the scenario described here). Should that happen, the same Solidity file could yield different storage layouts depending on the optimizations used during compilation.

In other words, we want to have every check in place to catch any potential issues that may arise from storage layout incompatibilities.

@spalladino
Copy link
Author

How does this extend to more complex data types?

Actually, I've just noticed that structs actually hide how their inner members are laid out in the storage. I'll update this PR so that info is included (most likely as a nested storage field with that information).

@chriseth
Copy link
Contributor

The problem with more complex data structures is that you could have mappings taking strings to structs and those structs contains arrays with mappings and so on.

@axic
Copy link
Member

axic commented Apr 30, 2018

If this is accepted, I think it should include the storage layout in the metadata too.

@@ -520,6 +520,8 @@ Json::Value StandardCompiler::compileInternal(Json::Value const& _input)
contractData["userdoc"] = m_compilerStack.natspecUser(contractName);
if (isArtifactRequested(outputSelection, file, name, "devdoc"))
contractData["devdoc"] = m_compilerStack.natspecDev(contractName);
if (isArtifactRequested(outputSelection, file, name, "storage"))
contractData["storage"] = m_compilerStack.storageInfo(contractName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and the key should be named storageLayoutMap os something similar.

@@ -932,6 +934,8 @@ void CommandLineInterface::handleCombinedJSON()
contractData[g_strNatspecDev] = dev::jsonCompactPrint(m_compiler->natspecDev(contractName));
if (requests.count(g_strNatspecUser))
contractData[g_strNatspecUser] = dev::jsonCompactPrint(m_compiler->natspecUser(contractName));
if (requests.count(g_strStorageInfo))
contractData[g_strStorageInfo] = dev::jsonCompactPrint(m_compiler->storageInfo(contractName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope not to support it on the commandline, rather expect people to use it through standard json IO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove it from the command line? It only affects the combined-json.

variable["offset"] = to_string(location.second);
variable["type"] = decl->type()->canonicalName();
variable["size"] = decl->type()->storageSize().str();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should contain the storage key as well for ease of use.

@axic
Copy link
Member

axic commented Apr 30, 2018

Btw, cannot the same be achieved through processing the AST JSON?

@chriseth
Copy link
Contributor

I kind of see the point here that you are afraid of unexpected changes in the storage layout, but I'm not sure if this is the right means to protect against them.

@spalladino
Copy link
Author

Btw, cannot the same be achieved through processing the AST JSON?

Only if we assume that the compiler will preserve in storage the order in which the variables were declared. We'd also need to replicate the inheritance linearization algorithm, since this would determine the order in which each contract is visited for setting up the storage. Overall, I'd say that storage layout depends on codegen, so the AST would not be the best place to retrieve this info.

@spalladino
Copy link
Author

spalladino commented Apr 30, 2018

I kind of see the point here that you are afraid of unexpected changes in the storage layout, but I'm not sure if this is the right means to protect against them.

I think that guarding against such changes should be responsibility of an external tool (which would handle upgradeability overall), but such tool would need input from the compiler. That said, I'm open to other ideas on how to retrieve this information; nevertheless, this approach is quite non-invasive, as it does not require changes on the compiler itself, but merely exposes information already available.

@chriseth
Copy link
Contributor

chriseth commented May 2, 2018

In general, I think this is a good change. The main negative point I have about this PR is that it adds "interface burden", we need to document it and be careful about breaking changes in this new interface. The other downside is that it does not go all the way, it does not specify how mappings are encoded, arrays etc., and this is why I think that it should not be part of a static output of the compiler.

@spalladino
Copy link
Author

Thanks for the reply, @chriseth

The main negative point I have about this PR is that it adds "interface burden", we need to document it and be careful about breaking changes in this new interface.

Agree, but this is also the main reason behind the PR: catching a breaking change that could otherwise go unnoticed and have dire consequences.

The other downside is that it does not go all the way, it does not specify how mappings are encoded, arrays etc., and this is why I think that it should not be part of a static output of the compiler.

If you are referring to how storage positions are determined (ie hash of slot and key), then I agree that the info is missing, and I'm not sure how it would be feasible to expose it. On the other hand, if you are referring to outputting information on the storage layout of the values handled by arrays/mappings (such as having a mapping(string => MyStruct), I have a few commits still in progress that add that information (I pushed last night the ones that added the info for structs).

@chriseth
Copy link
Contributor

chriseth commented May 2, 2018

Hm, actually preventing breaking changes in this area should rather be done by adding a test against that.

@chriseth
Copy link
Contributor

chriseth commented May 2, 2018

Created #4049

@spalladino
Copy link
Author

I've updated the implementation to correctly handle complex types (structs, mappings, and arrays) by recursively visiting their members. As an example, for a mapping:

contract test {
  mapping(uint => uint128) m;
}
[
  {
    "name": "m",
    "contract": "test",
    "offset": "0",
    "slot": "0",
    "type": {
      "name": "mapping(uint256 => uint128)",
      "size": "1",
      "bytes": "32",
      "key": {
        "name": "uint256",
        "size": "1",
        "bytes": "32"
      },
      "value": {
        "name": "uint128",
        "size": "1",
        "bytes": "16"
      }
    }
  }
]

Note that all type-related info is now moved into a nested type object. Simple fields were also changed, for consistency:

contract test {
  uint a;
}
[
  {
    "name": "a",
    "contract": "test",
    "offset": "0",
    "slot": "0",
    "type": {
      "name": "uint256",
      "size": "1",
      "bytes": "32"
    }
  }
]

@axic
Copy link
Member

axic commented May 25, 2018

Issue #1236 and #3736 proposes/discusses this too.

@frangio
Copy link
Contributor

frangio commented Jun 29, 2018

So to recap the situation here.

This feature request has apparently come up a few times already (#1236, #3736), because when one tries to work with Solidity at a lower level (e.g. building static analysis tools) there is often the need to know the position in storage of state variables. It's possible to do this using the AST by C3-linearizing the inheritance graph, but some of us think this is not robust because there is a significant possibility of error and incompatibility with future changes in the compiler.

The compiler already has this knowledge, so we are proposing to expose that.

I see two arguments against merging this.

  1. "Interface burden". If we can agree that this feature would contribute to a robust tooling ecosystem, I think we can agree that the trade-off is worth it. This PR is missing the documentation for the compiler option, though, but we can work on that. 🙂

  2. That the information exposed in this PR doesn't reflect the storage of mappings and arrays. This is true. To fix it, I would suggest adding a version number to the storage info output by the compiler. The version would specify what algorithm is used to resolve members of mappings and arrays (given the slot of the variable and the key used), and each algorithm can be explained in the Solidity docs.

@chriseth @axic What do you think about this? We would love to help get this done and maintain the feature going forward. 🙂

@chriseth
Copy link
Contributor

I'm fine with going forward with this, but we need a more detailed discussion about the format. Furthermore, tests are required that detect if there is a mismatch between the reported layout and the actual layout of the contract. Not sure how to do that, and perhaps we can also drop it if it is not possible, as the information is extracted directly from the relevant structures.

What do you think, @axic ?

@spalladino
Copy link
Author

@chriseth happy to discuss the JSON format anytime! I'm in the solidity gitter, feel free to ping me there and we can go through it. As for the tests, I think we could build something where we compare Solidity-based setters with the output of getStorageAt directed by the storage layout JSON, if you think it's worthwhile.

@@ -141,6 +141,8 @@ class CompilerContext
unsigned currentToBaseStackOffset(unsigned _offset) const;
/// @returns pair of slot and byte offset of the value inside this slot.
std::pair<u256, unsigned> storageLocationOfVariable(Declaration const& _declaration) const;
/// @returns all state variables along with their storage slot and byte offset of the value inside the slot.
std::map<Declaration const*, std::pair<u256, unsigned>> const& stateVariables() const { return m_stateVariables; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to go through ContractType instead of the compiler context.


// Common type info
data["name"] = type->canonicalName();
data["size"] = type->storageSize().str();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps better to call this storageSlots or numberOfSlots?

data["base"] = processType(array->baseType());
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs assertion that there is no other category.

}
else if (type->category() == Type::Category::Mapping) {
auto map = static_pointer_cast<const MappingType>(type);
data["key"] = processType(map->keyType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key type should be processed differently

@axic
Copy link
Member

axic commented Aug 3, 2018

As discussed on the call, it would make sense to:

  • insert an AST Node Id useful for looking up variables in the source (but shouldn't be used for storage comparison)
  • rename the name field to signal it is just a user friendly name
  • introduce a kind or category field to know the type of the encoding

Please also create a comprehensive documentation chapter - see the Standard JSON I/O chapter.

data["value"] = processType(map->valueType());
}
else if (type->category() == Type::Category::Array) {
auto array = static_pointer_cast<const ArrayType>(type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big difference between statically-sized and dynamically-sized arrays

@spalladino
Copy link
Author

Also discussed:

  • Add a flag indicating if a field is stored in-place or in a hashed position
  • Include a global version identifier for the hashing algorithm used
  • Rebase the work on top of develop

Regarding the file format, it was discussed to generate a flat structure, where type entries are pointers to a set of types defined afterwards. The rich type identifier could be used as identifiers for these types. This would allow representing recursive structures, and also offer a more compact format. In one of the examples above, the JSON would be sth like:

{
  "hash_version": "1",
  "storage": [{
    "label": "m",
    "ast_id": 42,
    "contract": "test",
    "offset": "0",
    "slot": "0",
    "type": "mapping(uint256 => uint128)_identifier"
  }],
  "types": {
    "mapping(uint256 => uint128)_identifier": {
      "label": "mapping(uint256 => uint128)",
      "kind": "mapping",
      "size_in_slots": "1",
      "bytes": "32",
      "value_type": "uint128_identifier",
    },
    "uint128_identifier": {
      "label": "uint128",
      "kind": "scalar",
      "size_in_slots": "1",
      "bytes": "16"
    }
  }
}

Thanks for all the input, folks!!

@chriseth
Copy link
Contributor

@spalladino could you please inform us about the state of this PR?

@spalladino
Copy link
Author

@chriseth currently on hold unfortunately, due to lack of availability on our end. We have implemented a workaround by inferring the storage layout from the AST in the meantime (which is not as reliable, but catches most cases). I plan on getting back at it a few weeks after devcon. That said, I'm ok if you want to close the PR and I open a new one once I have the new implementation we had agreed upon.

@chriseth
Copy link
Contributor

We can keep it open for now.

@livnev
Copy link

livnev commented Nov 19, 2018

I'd like to add that this feature is very important tooling for formal techniques that want to reason about EVM bytecode generated from Solidity. Most people want to read/write specifications that talk about solidity storage variables, instead of the storage locations themselves, meaning that there needs to be a robust process of translating from one to the other. Right now we have no choice but to manually map out the storage layout of the contract by reading the bytecode, which is tedious, error-prone, and can be broken by updates to solc.

@m0ar
Copy link

m0ar commented Nov 21, 2018

This functionality would be super useful in writing efficient Ethereum data cache solutions! A clear map of storage locations would make it way simpler for us to update cache on new blocks, and possibly make it automatic given a contract ABI 👌

@chriseth
Copy link
Contributor

Closing this for now. Please reopen (or create a new pull request) once you start again working on this.

@chriseth chriseth closed this Dec 12, 2018
@axic axic mentioned this pull request Apr 7, 2019
@MrChico
Copy link
Member

MrChico commented Apr 7, 2019

@spalladino any chance this might be continued? 🙏

@spalladino
Copy link
Author

@MrChico would love to, but haven't had the time to work on it for a long time already. I know @nanexcool was also interested, maybe someone from his team can tackle it...?

@leonardoalt
Copy link
Member

This PR #7589 adds a basic version of this feature. It outputs the first layer of the storage layout, that is, no nested information.

@axic
Copy link
Member

axic commented Jul 29, 2020

See also the discussion on #597.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Storage Keys to ABI
8 participants