Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve log scrubbing: #2358

Closed
wants to merge 2 commits into from
Closed

Improve log scrubbing: #2358

wants to merge 2 commits into from

Conversation

nbougalis
Copy link
Contributor

@nbougalis nbougalis commented Jan 26, 2018

Per issue #2354, when the log level of a server was configured at "trace", sensitive keying meterial generated by the wallet_propose command could be written to the server's log file, if one was configured.

This commit improves the log scrubbing code to account for the sensitive information generated by a wallet_propose.

Important security consideration

We still caution everyone against executing this command on a server that they do not control: a malicious server operator could intercept the generated keypair, or operate a modified server that returns keypairs that are not securely generated.

@ripplelabs-jenkins
Copy link
Collaborator

ripplelabs-jenkins commented Jan 26, 2018

Jenkins Build Summary

Built from this commit

Built at 20180130 - 02:13:15

Test Results

Build Type Result Status
coverage 985 cases, 0 failed, t: 623s PASS ✅
clang.debug.unity 985 cases, 0 failed, t: 392s PASS ✅
clang.debug.nounity 983 cases, 0 failed, t: 322s PASS ✅
gcc.debug.unity 985 cases, 0 failed, t: 441s PASS ✅
clang.release.unity 984 cases, 0 failed, t: 480s PASS ✅
gcc.debug.nounity 983 cases, 0 failed, t: 787s PASS ✅
gcc.release.unity 984 cases, 0 failed, t: 504s PASS ✅


s.replace (first, last - first, last - first, '*');
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the find('\"',...) there is no need to limit the starting search position to the size of the string. The find implementation does this check already.

I recommend searching for the second " instead of the , after the first " because the sensitive value may be the last key/value pair in that level of the jason.

Here is a minor tweak which takes these suggestions into account:

    auto scrubber = [&s](char const* token)
    {
        auto first = s.find(token);

        // If we have found the specified token, then attempt to isolate the
        // sensitive data (it's enclosed by double quotes)
        // and mask it:

        if (first != std::string::npos)
        {
            first = s.find ('\"', first + std::strlen(token));
            if (first != std::string::npos)
            {
                auto last = s.find('\"', first+1);

                if (last == std::string::npos)
                    last = s.size();
                else
                    ++last;

                s.replace (first, last - first, last - first, '*');
            }
        }
    };

Copy link
Contributor

@miguelportilla miguelportilla Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely to happen but worth mentioning that subsequent occurrences of the token do not get masked.

Copy link
Collaborator

@seelabs seelabs Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is called for every write to the log, and as written it does seven passes of the string, how about using a regex to do one pass. Something like:

std::string
scrub (std::string s)
{
    static std::regex const r{
        R"regex("(seed|seed_hex|secret|master_key|master_seed|master_seed_hex|passphrase)")regex",
        std::regex_constants::optimize};

    std::smatch m;
    std::smatch::difference_type p = 0;
    while (1)
    {
        if (!std::regex_search(s.cbegin() + p, s.cend(), m, r))
            return s;
        p = m.position() + p;
        p = s.find('\"', p + m.length());
        if (p == std::string::npos)
            return s;

        auto last = s.find(',', p);
        if (last == std::string::npos)
            last = s.size();
        s.replace (p, last - p, last - p, '*');
        p = last;
    }

    return s;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data point:

I took my implementation and tested it against the input string in the bug report that is prefixed by "2018-Jan-25 15:02:38 Server:DBG Reply:". Then I took Scott's implementation, changed it to search for the second " instead of ,, and tested it against the same string. Compiled with clang/libc++ at -O3. Mine took about 50us and Scott's about 230us. Other regex implementations might be much faster. I wrote the libc++ regex, and I know it sucks. :-\

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HowardHinnant Does that include the time to compile the regex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HowardHinnant Ah I see. I incorrectly assumed we only wanted to mask within the quotes to keep the json valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think "***" would look better, but I'm not going to loose any sleep either way. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It would be, theoretically, marginally more secure (you don't leak the length). However, that would necessitate a lot more copying. I'm happy to tweak that if others think it's important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think masking only within the quotes is a good idea. The tweak I previously mentioned to @HowardHinnant may be all we need.

Copy link
Contributor

@miguelportilla miguelportilla Jan 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote another version...

    auto scrubber = [&s](char const* prefix,
        std::vector<char const*> const& endings)
    {
        auto first = s.find(prefix);
        if (first == std::string::npos)
            return;
        size_t const offset = first + std::strlen(prefix);
        for (auto const& e : endings)
        {
            auto const len = std::strlen(e);
            if ((s.compare(offset, len, e)) == 0)
            {
                first = s.find('\"', offset + len);
                if (first != std::string::npos)
                {
                    auto last = s.find('\"', first + 1);
                    if (last == std::string::npos)
                        last = s.size();
                    else
                        ++last;
                    s.replace(first, last - first, last - first, '*');
                }
                return;
            }
        }
    };

called as such...

    static std::vector<std::pair<char const*, std::vector<char const*>>> const tokens
    {
        {{"\"seed"}, {"\"", "ed_hex\""}},
        {{"\"secret"}, {"\""}},
        {{"\"master_"}, {"key\"", "seed\"", "seed_hex\""}},
        {{"\"passphrase"}, {"\""}}
    };
    for (auto const& t : tokens)
        scrubber(t.first, t.second);

This version is about 80% faster than @HowardHinnant's version but probably not worth having the extra complexity.

scrubber ("\"master_seed\"");
scrubber ("\"master_seed_hex\"");
scrubber ("\"passphrase\"");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minor exercise to increase our confidence this is a complete list, how about commenting the list of JSS in src/ripple/protocol/JsonFields.h with an attribute for each field that contains sensitive information. For example:

JSS ( passphrase );                 // in [sensitive]: WalletPropose

As an example, the act of writing this comment caused me to wonder if the next JSS down from passphrase is also "sensitive" (password).

@sublimator
Copy link
Contributor

sublimator commented Jan 26, 2018 via email

@HowardHinnant
Copy link
Contributor

We have this thing about performance...

@seelabs
Copy link
Collaborator

seelabs commented Jan 29, 2018

One issue is we create an unnecessary copy of the message string when there are no secrets to be scrubbed (and the vast majority will have no secrets).

How about an interface that "appendsScrubbed" instead of "return scrubbed". This makes it easier to avoid the copy. Something like (note untested code, just look at the interface and how the copy is avoided). Another alternative would be remove the scrub function and inline the code into format: seelabs@ab1fc22

@nbougalis
Copy link
Contributor Author

nbougalis commented Jan 29, 2018

I've addressed all review comments and restructured the code to reduce the amount of copying by inlining the scrub function.

Copy link
Contributor

@HowardHinnant HowardHinnant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and tests out fine here.

@miguelportilla
Copy link
Contributor

miguelportilla commented Jan 29, 2018

The code is passable but I will mention a few weaknesses.

  • Subsequent tokens do not get masked.
    eg. "secret":"confidential","secret":"exposed"

  • If values are encountered similar to the tokens, the wrong thing gets masked.
    eg. "some_key" : "secret"

  • The current masking includes the double quotation invalidating json objects.

Per issue #2354, when the log level of a server was configured at
"trace", sensitive keying meterial generated by the `wallet_propose`
command could be written to the server's log file, if one was
configured.

This commit improves the log scrubbing code to account for the
sensitive information generated by a `wallet_propose`.

** Important security consideration **

We still caution everyone *against* executing this command on a
server that they do not control: a malicious server operator could
intercept the generated keypair, or operate a modified server that
returns keypairs that are not securely generated.
Copy link
Contributor

@mellery451 mellery451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the current state of this change. That said, I'm slightly apprehensive about the string matching for every log line, when in fact, we only care about certain messages that actually have json in them. To that end, I threw together a possibly alternative approach by modifying StyledStreamWriter: mellery451@592749a. This is incomplete and untested, but you get the idea. The main downside here is that you have to know to mod teh stream if you are streaming json to the log, but that should be a relatively small number of locations (?). In any event, I'm fine with the current approach and only threw this out to see if it's worth considering.

@nbougalis
Copy link
Contributor Author

@miguelportilla: Thanks. I think that this isn't really meant to be a comprehensive solution. We can't possibly redact everything and if we could today, we couldn't guarantee that wouldn't change tomorrow.

To address your points more specifically:

  • There should never be subsequent tokens in the JSON we are filtering.

  • I guess someone could make it happen by crafting input that results in a value matching an expecetd token, but that only hurts the person doing that request and fixing it is not possible without extra logic to understand the JSON, which is probably not a good idea.

  • The current masking should not invalidate anything: it includes the opening and closing quotes and replaces only the content with *.

@codecov-io
Copy link

Codecov Report

Merging #2358 into develop will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2358      +/-   ##
===========================================
+ Coverage    70.04%   70.05%   +0.01%     
===========================================
  Files          704      703       -1     
  Lines        53342    53036     -306     
===========================================
- Hits         37361    37155     -206     
+ Misses       15981    15881     -100
Impacted Files Coverage Δ
src/ripple/json/json_writer.h 100% <ø> (ø) ⬆️
src/ripple/rpc/impl/Handler.cpp 98.96% <ø> (-0.02%) ⬇️
src/ripple/net/impl/RPCCall.cpp 61.8% <ø> (+0.24%) ⬆️
src/ripple/rpc/impl/RPCHandler.cpp 65.38% <100%> (+1.63%) ⬆️
src/ripple/json/impl/json_writer.cpp 79.71% <100%> (+0.64%) ⬆️
src/ripple/basics/DecayingSample.h 77.77% <0%> (-8.34%) ⬇️
src/ripple/server/impl/BasePeer.h 46.15% <0%> (-3.85%) ⬇️
src/ripple/protocol/impl/STVar.cpp 85.71% <0%> (-2.6%) ⬇️
src/ripple/basics/impl/CheckLibraryVersions.cpp 93.18% <0%> (-2.28%) ⬇️
src/ripple/conditions/impl/error.cpp 43.24% <0%> (-1.5%) ⬇️
... and 107 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eaff9a0...e117578. Read the comment docs.

@HowardHinnant HowardHinnant added the Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required. label Jan 30, 2018
@seelabs
Copy link
Collaborator

seelabs commented Jan 30, 2018

In 0.90.0-b5

@seelabs seelabs closed this Jan 30, 2018
@miguelportilla
Copy link
Contributor

We may want to use the jsonFields string values when defining the tokens.
eg. jss::seed vs "seed"

@nbougalis nbougalis deleted the scrub branch March 22, 2018 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready to merge *PR author* thinks it's ready to merge. Has passed code review. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants