Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve command-line processing of gaia_db_extract #1048

Merged
merged 8 commits into from
Nov 5, 2021

Conversation

waynelwarren
Copy link
Contributor

General improvements to the main() function of gaia_db_extract.

Addresses https://gaiaplatform.atlassian.net/browse/GAIAPLAT-1630.

Improved the robustness and compactness of the command-line processing. Added usage() for invalid command-line options.

@simone-gaia
Copy link
Contributor

simone-gaia commented Oct 29, 2021

There are a bunch of warnings comeing from this code:

/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/inc/table_iterator.hpp:33:27: warning: 2 uninitialized fields at the end of the constructor call [clang-analyzer-optin.cplusplus.UninitializedObject]
        m_current_payload = nullptr;
                          ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/inc/table_iterator.hpp:64:11: note: uninitialized pointer 'this->m_table_name'
    char* m_table_name;
          ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/inc/table_iterator.hpp:67:31: note: uninitialized field 'this->m_container_id'
    gaia::common::gaia_type_t m_container_id;
                              ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:274:9: note: Assuming the condition is false
    if (database.size() == 0 && table.size() == 0)
        ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:274:30: note: Left side of '&&' is false
    if (database.size() == 0 && table.size() == 0)
                             ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:278:14: note: Assuming the condition is true
    else if (database.size() > 0 && table.size() > 0)
             ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:278:14: note: Left side of '&&' is true
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:278:37: note: Assuming the condition is true
    else if (database.size() > 0 && table.size() > 0)
                                    ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:278:10: note: Taking true branch
    else if (database.size() > 0 && table.size() > 0)
         ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:280:16: note: Calling 'dump_rows'
        return dump_rows(database, table, start_after, row_limit);
               ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:195:22: note: Calling default constructor for 'table_iterator_t'
    table_iterator_t table_iterator;
                     ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/inc/table_iterator.hpp:33:27: note: 2 uninitialized fields at the end of the constructor call
        m_current_payload = nullptr;
                          ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:65:15: warning: loop variable is copied but only used as const reference; consider making it a const reference [performance-for-range-copy]
    for (auto field : table.gaia_fields())
              ^
         const  &
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:80:15: warning: loop variable is copied but only used as const reference; consider making it a const reference [performance-for-range-copy]
    for (auto table : db.gaia_tables())
              ^
         const  &
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:89:13: warning: invalid case style for global variable 's_cache_initialized' [readability-identifier-naming]
static bool s_cache_initialized = false;
            ^~~~~~~~~~~~~~~~~~~
            g_s_cache_initialized
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/gaia_db_extract.cpp:127:15: warning: loop variable is copied but only used as const reference; consider making it a const reference [performance-for-range-copy]
    for (auto db : gaia_database_t::list())
              ^
         const  &
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/table_iterator.cpp:98:22: warning: Value stored to 'value' during its initialization is never read [clang-analyzer-deadcode.DeadStores]
                auto value = get_field_array_element(
                     ^
/home/simone/repos/GaiaPlatform/production/tools/gaia_db_extract/src/table_iterator.cpp:98:22: note: Value stored to 'value' during its initialization is never read

// Command-line usage.
static void usage()
{
fprintf(stderr, "Usage: gaia_db_extract [--database=<dbname>] [--table=<tableneme>] [--start-after=ID] [--row-limit=N]\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you have defined constants for the arguments (except for row-limit). Why not use them here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some changes to this. Hope this is what you meant.

static void usage()
{
fprintf(stderr, "Usage: gaia_db_extract [--database=<dbname>] [--table=<tableneme>] [--start-after=ID] [--row-limit=N]\n");
fprintf(stderr, " No parameters: dump catalog.\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not just use std::cerr and the stream operators here? (more modern C++)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it.

{
row_limit = atoi(argv[++i]);
start_after = atoi(value.c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can use std::stoi and just use the string value without needing to convert it to a C string via c_str().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this too.

start_after = atoi(value.c_str());
if (start_after < 1)
{
fprintf(stderr, "Illegal value for start_after. It must be 1 or greater\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could use your constant here. If you use the stream operators << then this is something like:

std::cerr << "Illegal value for " << c_start_after << "...";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it's now done this way everywhere.

@waynelwarren
Copy link
Contributor Author

@simone-gaia , I don't get those warnings when I build. And that file wasn't changed in this PR. What causes this to happen?

@simone-gaia
Copy link
Contributor

@simone-gaia , I don't get those warnings when I build. And that file wasn't changed in this PR. What causes this to happen?

I noticed those warnings by doing a full build of the system. Even though you did not modify gaia_db_extract.cpp you could take the occasion to remove the warnings.

@waynelwarren
Copy link
Contributor Author

@simone-gaia , I'm seeing the warnings too. The best way for me to address them is in a new PR.

@simone-gaia
Copy link
Contributor

@simone-gaia , I'm seeing the warnings too. The best way for me to address them is in a new PR.

As you wish, just pointing out.

Copy link
Contributor

@LaurentiuCristofor LaurentiuCristofor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate the error messages with @vDonGlover ; otherwise these changes look good to me.

@@ -24,10 +24,10 @@ constexpr char c_table_string[] = "table";
static void usage()
{
cerr << "Usage: gaia_db_extract [--" << c_database_string << "=<databasename>] [--" << c_table_string << "=<tableneme>] [--"
<< c_start_string << "=ID] [--" << c_row_limit_string << "=N]" << endl;
<< c_start_after_string << "=ID] [--" << c_row_limit_string << "=N]" << endl;
cerr << " No parameters: dump the catalog only." << endl;
cerr << " Else dump rows specified by " << c_database_string << "/" << c_table_string << " name, limited by "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me what is going on here. What is being reported? And is the message saying what it is doing or giving instructions of what to do?

{
if (strncmp(argv[arg], "--", 2))
{
fprintf(stderr, "Incorrect command-line parameter: %s\n", argv[arg]);
return false;
cerr << "Incorrect command-line parameter: " << argv[arg] << endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have 'standard" wording for this? In gaiac we say
cerr << c_error_prompt << "Unrecognized parameter: " << argv[i] << endl;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with that.

fprintf(stderr, " Else dump rows specified by database/table name, limited by start-after and row-limit.\n");
cerr << "Usage: gaia_db_extract [--" << c_database_string << "=<databasename>] [--" << c_table_string << "=<tableneme>] [--"
<< c_start_string << "=ID] [--" << c_row_limit_string << "=N]" << endl;
cerr << " No parameters: dump the catalog only." << endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean "Extracts the entire catalog" (ie all of the contents of the database)? Or does it mean that it extracts the structure of the database? Or ...?

}
else
{
cerr << "Invalid command-line option: '" << key << "'." << endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from an invalid parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not. Have made them the same.

@waynelwarren
Copy link
Contributor Author

@vDonGlover , I have reworked the usage() and error messages significantly. Please review again.

cerr << " No parameters: dump the catalog only." << endl;
cerr << " Else dump rows specified by " << c_database_string << "/" << c_table_string << " name, limited by "
<< c_start_after_string << " and " << c_row_limit_string << "." << endl;
cerr << "OVERVIEW: Produce JSON representation of catalog, or database rows:" << endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this saying something like this:

OVERVIEW: Outputs a JSON representation of either the catalog structure or selected rows for a specified database and table within the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better, of course. I just like being terse.

cerr << " --" << c_database_string << "=<databasename> Required. Selects the database from which to extract rows." << endl;
cerr << " --" << c_table_string << "=<tablename> Required. Selects the table containing the rows being extracted." << endl;
cerr << " --" << c_row_limit_string << "=N Optional. Limit the number of rows to N. Otherwise, extract all rows." << endl;
cerr << " --" << c_start_after_string << "=ID Optional. Assuming a block of rows has already been extracted, start this" << endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question not specifically related to this message (but could have some bearing), what if they specify an id and there has been no previous block? Or they specify an ID that is somewhere in the middle of the previous block or a few rows after the end of the previous block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could all be explained, but it's a bit overwhelming. The purpose of the start-after is to obtain multiple blocks instead of all rows. It doesn't matter if they break that rule, as it will work anyway. To start at the beginning, leave the parameter off.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, really the definition is:
Specifies the ID of the row at which the extract starts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To extract rows from the database in blocks, use this parameter in conjunction with c_row_limit_string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subtlety here is that start-after does not specify the starting ID, it specifies the ID of the row prior to the first one we want. This is because we don't know the ID of the first one - we only know the ID of the previous one.

The algorithm for dumping all rows one block at a time is very simple, but quite a bit to explain in usage(). A developer could figure it out with just a little experimentation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifies the ID of the row that precedes the row at which to start the extract.
Typically this will be the last row of the block that you just extracted.
To extract rows from the database in blocks, use this parameter in conjunction with c_row_limit_string.

Copy link

@vDonGlover vDonGlover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@waynelwarren waynelwarren merged commit 3af8a0a into master Nov 5, 2021
@waynelwarren waynelwarren deleted the wayne/improve-extract-main branch November 5, 2021 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants