-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VelocyPack to the native JSON benchmark #1
Comments
@jsteemann Also does anyone have take a look at http://rapidjson.org? |
After hacking around with velocypack, the nativejson benchmark, and our taoccp/json library for a few hours, here are a few preliminary results. Note that, for now, I always chose the easiest way, in particular the Besides pure velocypack and pure taocpp/json there are also results for a combination of the two, the taocpp/json parser with a velocypack In the conformance section, taocpp/json achieves a 100% overall score, velocypack 94%, and the combination 97%. The improvement from velocypack to the combination is due to taocpp/json's better parsing of doubles. The 3% missing for the combination are due to some failures in the roundtrip tests which might or might not be real issues. In the performance section, taocpp/json, velocypack and the combination all are equally fast in the stringify tests. And now the number you've probably been waiting for, on my laptop the overall parsing benchmark results are 35ms for taocpp/json, 52ms for velocypack, and 26ms for the combination. taocpp/json uses |
@ColinH : thanks for your work on this. We did not find time to add velocypack to the nativejson-benchmark ourselves yet. |
@jsteemann It's still work in progress but if you send an email to [email protected] I can share some more details. |
One thing that is ready to be shown is the adapter from the taocpp/json events API to the velocypack struct to_velocypack_events
{
arangodb::velocypack::Builder builder;
void add( const arangodb::velocypack::Value & v )
{
if ( m_member ) {
builder.add( m_key, v );
m_member = false;
}
else {
builder.add( v );
}
}
void null()
{
add(arangodb::velocypack::Value(arangodb::velocypack::ValueType::Null));
}
void boolean( const bool v )
{
add(arangodb::velocypack::Value(v));
}
void number( const std::int64_t v )
{
add(arangodb::velocypack::Value(v));
}
void number( const std::uint64_t v )
{
add(arangodb::velocypack::Value(v));
}
void number( const double v )
{
add(arangodb::velocypack::Value(v));
}
void string( const std::string& v )
{
add(arangodb::velocypack::Value(v));
}
void begin_array()
{
add(arangodb::velocypack::Value(arangodb::velocypack::ValueType::Array));
}
void element()
{
}
void end_array()
{
builder.close();
}
void begin_object()
{
add(arangodb::velocypack::Value(arangodb::velocypack::ValueType::Object));
}
std::string m_key;
bool m_member = false;
void key( const std::string& v )
{
m_key = v;
m_member = true;
}
void key( std::string&& v )
{
m_key = std::move( v );
m_member = true;
}
void member()
{
}
void end_object()
{
builder.close();
}
}; |
@ColinH : thanks for the update. If it's still work in progress, then I prefer waiting until you declare it finished or stable and then try your "official" fork. If in the meantime you find any conformance errors in velocypack that block you from making progress, just let us know so we can fix them. |
Here are the roundtrip tests that both velocypack and the combination of the taocpp/json parser with velocypack fail. The third one should, in my opinion - see also my comments in the nativejson benchmark issue linked above - not really be considered a failure since it is an equivalent representation.
Keep in mind that most libraries do not achieve 100% in the conformance tests; currently only RapidJSON in full precision mode, and taocpp/json do. |
we can potentially add options to the velocypack Dumper for these cases so it can optionally produce the same results. |
Correct, these three small things are the only issues in the roundtrip conformance. There are more issues with double conformance, if you drop me a line I can tell you how we fixed them. As often the case, as soon as floating point is involved things are rather complicated (unless you don't care about precision).
|
Oh yes, if you fixed these ones in your library already and have some hints, that should definitely save us some work! Can you point me at your floating point parser/builder in your library (I guess there is one)? I think I can use that as a starting point and check what it does differently than ours. |
We use a modified version of the V8 double-conversion library that was adapted to interface more directly with the PEGTL, our C++11 parser library in taocpp that we use for taocpp/json. I just forked velocypack and replaced the included json parser with taocpp/json, the result can be seen here. This fixes the floating-point issues, and, as mentioned above, doubles the performance of parsing json to a velocypack For completeness it would be necessary to use taocpp/json for the serialisation to json, too, and to fix the one "TODO" in |
Btw, the (Our in-memory representation is based on the standard containers, which makes inspection and manipulation very easy, as such it is complementary to velocypack.) |
Sounds good! |
FYI, the glue code to produce taocpp/json Events from a In addition, it makes velocypack compatible with all other taocpp/json Events producers and consumers, you can convert velocypack to/from several other binary formats, several JSON in-memory representations, apply some simple transformations - and of course easily add some more. |
#include "../test.h"
#include "velocypack/vpack.h"
// Nativejson-benchmark integration of ColinH/velocypack,
// an experimental arangodb/velocypack plus taocpp/json.
class StatHandler
{
public:
StatHandler(Stat& stat) : stat_(stat) {}
void null() { stat_.nullCount++; }
void boolean(const bool v) { v ? stat_.trueCount++ : stat_.falseCount++; }
void number(const std::int64_t) { stat_.numberCount++; }
void number(const std::uint64_t) { stat_.numberCount++; }
void number(const double) { stat_.numberCount++; }
void string(const tao::string_view& v ) { stat_.stringCount++; stat_.stringLength += v.size(); }
void binary(const tao::byte_view& ) {}
void begin_array(const std::size_t = 0) {}
void element() { stat_.elementCount++; }
void end_array(const std::size_t = 0) { stat_.arrayCount++; }
void begin_object(const std::size_t = 0) {}
void key(const tao::string_view& v) { stat_.stringCount++; stat_.stringLength += v.size(); }
void member() { stat_.memberCount++; }
void end_object(const std::size_t = 0) { stat_.objectCount++; }
private:
StatHandler& operator=(const StatHandler&) = delete;
Stat& stat_;
};
static void GenStat(Stat& stat, const arangodb::velocypack::Builder& builder){
StatHandler statHandler(stat);
arangodb::velocypack::builderToEvents(statHandler, builder);
}
struct velocypack_options
{
velocypack_options()
{
options.validateUtf8Strings = true;
options.checkAttributeUniqueness = true;
}
arangodb::velocypack::Options options;
};
struct velocypack_parser
{
velocypack_parser()
: options(),
parser()
{}
velocypack_options options;
arangodb::velocypack::Parser parser;
};
class VELOCYPACKParseResult : public ParseResultBase {
public:
std::shared_ptr< arangodb::velocypack::Builder > root;
};
class VELOCYPACKStringResult : public StringResultBase {
public:
virtual const char* c_str() const { return s.c_str(); }
std::string s;
};
class VELOCYPACKTest : public TestBase {
public:
#if TEST_INFO
virtual const char* GetName() const { return "velocypack (C++11)"; }
virtual const char* GetFilename() const { return __FILE__; }
#endif
#if TEST_PARSE
virtual ParseResultBase* Parse(const char* json, size_t length) const {
VELOCYPACKParseResult* pr = new VELOCYPACKParseResult;
try {
velocypack_parser parser;
parser.parser.parse(reinterpret_cast<const uint8_t *>(json), length);
pr->root = parser.parser.steal();
}
catch (...) {
delete pr;
return nullptr;
}
return pr;
}
#endif
#if TEST_STRINGIFY
virtual StringResultBase* Stringify(const ParseResultBase* parseResult) const {
const VELOCYPACKParseResult* pr = static_cast<const VELOCYPACKParseResult*>(parseResult);
VELOCYPACKStringResult* sr = new VELOCYPACKStringResult;
sr->s = arangodb::velocypack::builderToJsonString( *pr->root );
return sr;
}
#endif
#if TEST_PRETTIFY
virtual StringResultBase* Prettify(const ParseResultBase* parseResult) const {
const VELOCYPACKParseResult* pr = static_cast<const VELOCYPACKParseResult*>(parseResult);
VELOCYPACKStringResult* sr = new VELOCYPACKStringResult;
sr->s = arangodb::velocypack::builderToPrettyJsonString( *pr->root );
return sr;
}
#endif
#if TEST_STATISTICS
virtual bool Statistics(const ParseResultBase* parseResult, Stat* stat) const {
const VELOCYPACKParseResult* pr = static_cast<const VELOCYPACKParseResult*>(parseResult);
::memset(stat, 0, sizeof(Stat));
GenStat(*stat, *pr->root);
return true;
}
#endif
// TEST_SAXROUNDTRIP does not involve velocypack (only taocpp/json).
// TEST_SAXSTATISTICS does not involve velocypack (only taocpp/json).
#if TEST_CONFORMANCE
virtual bool ParseDouble(const char* json, double* d) const {
try {
velocypack_parser parser;
parser.parser.parse( std::string( json ) );
const auto builder = parser.parser.steal();
arangodb::velocypack::Slice slice( builder->start() );
if ( slice.type() == arangodb::velocypack::ValueType::Array ) {
slice = slice.at( 0 );
if ( slice.type() == arangodb::velocypack::ValueType::Double ) {
*d = slice.getDouble();
return true;
}
}
}
catch (...) {
}
return false;
}
virtual bool ParseString(const char* json, std::string& s) const {
try {
velocypack_parser parser;
parser.parser.parse( std::string( json ) );
const auto builder = parser.parser.steal();
arangodb::velocypack::Slice slice( builder->start() );
if ( slice.type() == arangodb::velocypack::ValueType::Array ) {
slice = slice.at( 0 );
if ( slice.type() == arangodb::velocypack::ValueType::String ) {
arangodb::velocypack::ValueLength length;
const char * string = slice.getString( length );
s = std::string( string, length );
return true;
}
}
}
catch (...) {
}
return false;
}
#endif
};
REGISTER_TEST(VELOCYPACKTest); |
@jsteemann The Performance section in the docs is still to be written. Do we have some numbers, maybe on some 3rd party site? |
@kvahed @jsteemann FYI, just moved everything into the |
The repository https://github.com/miloyip/nativejson-benchmark contains a benchmark suite for various C/C++-based JSON parsers and generators. It would be nice to get VelocyPack into that list so its performance can be compared to other parsers/generators easily.
The text was updated successfully, but these errors were encountered: