Performance of select query using index is extremely low #3041

mrft · 2014-11-06T19:47:05Z

create property V.name string
create index vname on V( name ) notunique

select from V where name = 'HEY BO DIDDLEY'

=> Query executed in 0.069 sec. Returned 1 record(s)

explain select from V where name = 'HEY BO DIDDLEY'

returns

             "limit": -1,
            "compositeIndexUsed": 1,
            "involvedIndexes": [
                "vname"
            ],
            "fullySortedByIndex": false,
            "indexIsUsedInOrderBy": false,
            "current": "#9:1",
            "evaluated": 1,
            "fetchingFromTargetElapsed": 0,
            "documentReads": 1,
            "user": "#5:0",
            "documentAnalyzedCompatibleClass": 1,
            "recordReads": 1,
            "elapsed": 1.978317,
            "resultType": "collection",
            "resultSize": 1,
            "@fieldTypes": "compositeIndexUsed=l,involvedIndexes=e,evaluated=l,fetchingFromTargetElapsed=l,documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,elapsed=f"
        }
    ],
    "notification": "Query executed in 0.068 sec. Returned 1 record(s)"

which indicates the index is being used.

69 milliseconds, for fetching 1 single record, you could beat that performnce by a serial search. This is a severe contrast with OrientDB boasting about its speed on the website...

Oh, and I am running OrientDB locally (so no network latency), and ran this query through the studio (but that doesn't really matter). Version 1.7.9 or 2.0-M2 doesn't matter also

lvca · 2014-11-06T19:58:17Z

Is this a benchmark? Have you tried with 1 or 100 records?

andrii0lomakin · 2014-11-06T20:02:43Z

How many records were into the index ?
Are you able to provide data of benchmark ?

andrii0lomakin · 2014-11-06T20:07:57Z

Did you make cache hot before benchmark ?

mrft · 2014-11-06T22:24:49Z

It is just the GratefulDeadConcerts database.

Doing the same query a few times from the console seems faster:
Query executed in 0.005 sec(s).

But still 5 milliseconds for 1 single record is extremely slow in my opinion, don't you think?

Here is some tests using a node script that calls the restful API and uses Oriento:

1000 X query [select from V where name = 'HEY BO DIDDLEY'] through REST took 8354 milliseconds. That's 119.70313622216902 per second, and 8.354 ms per event.
Start 1000 queries with BINARY protocol and query = select from V where name = 'HEY BO DIDDLEY'
1000 X query [select from V where name = 'HEY BO DIDDLEY'] through BINARY took 3222 milliseconds. That's 310.36623215394167 per second, and 3.222 ms per event.

lvca · 2014-11-06T22:30:10Z

Can you try to get 1 and 50 records? I bet the time is pretty close. If you get 1 record you've a fixed cost in time.

mrft · 2014-11-07T09:15:48Z

WONTFIX + INVALID ?

Fixed cost in time?
Then can you please explain these results to me?
Or tell me what I am doing wrong?

Some more results from SNAPSHOT-2.0 from last tuesday I think...

1000 X query [select from V where name <> 'HEY BO DIDDLEY' limit 50] through REST took 205901 milliseconds. That's 4.856702978615937 per second, and 205.901 ms per event.
1000 X query [select from V where name <> 'HEY BO DIDDLEY' limit 50] through BINARY took 216148 milliseconds. That's 4.62645964801895 per second, and 216.148 ms per event.

1000 X query [select from V where name in  [ 'HEY BO DIDDLEY','IM A MAN','NOT FADE AWAY','BERTHA','GOING DOWN THE ROAD FEELING BAD','MONA','Bo_Diddley','Garcia','Spencer_Davis' ]] through REST took 51524 milliseconds. That's 19.408431022436147 per second, and 51.524 ms per event.

And can it get any more basic than this?

 
1000 X query [select from V limit 50] through REST took 191517 milliseconds. That's 5.221468590255696 per second, and 191.517 ms per event.

lvca · 2014-11-07T11:10:58Z

@mrft I closed the issue because you didn't respond at my first question about executing the query with 1 and 50 returning items.

lvca · 2014-11-07T11:12:51Z

@mrft In your last comment you don't write how you tested REST and Binary protocols. Can you execute just 2 queries through console or CURL that retrieve 1 and then 50 items?

mrft · 2014-11-07T13:33:15Z

Using console

connect remote:localhost/GratefulDeadConcerts admin admin

select from V limit 50

(I did it > 5 times to let the memory caching kick in):

50 item(s) found. Query executed in 0.124 sec(s).

lvca · 2014-11-07T14:09:28Z

Ok, so it's 0,00248 per record.

About the index, have you used a UNIQUE_HASH_INDEX?

mrft · 2014-11-07T23:52:03Z

NOTUNIQUE as you can see in my first post (since not all V in GratefulDeadConcerts have a name I think so you have multiple NULL values...)

2.48 ms per record, sounds ok if you put it like that, but it should be 2.48 ms for the 50 small records (or maybe 5ms or even 10ms but not 124ms). Try a similar query on MySQL on Postgres, and you'll know what I mean.

Do you knwo what is causing this? ToJSON couldn't take that long...

andrii0lomakin · 2014-11-09T13:26:14Z

Guys it is not even 2 ms per record we can get about 5 000 records per second which is 0.2 ms per
record. I will try to provide Java based benchmark at first, I suppose no one use console as interface to interact with application. And then we will discuss it's results.

andrii0lomakin · 2014-11-09T13:42:19Z

Ok so for one test I got 100 rec/ms . 0.01 ms for record. I will check it on this database.

mrft · 2014-11-13T08:54:57Z

Some more tests, this time I wrote a very simple java application, and started fetching all the records

    public static void main( String[ ] args) {
        for ( int i = 0; i < 50; i++ ) {
            testDocument_GratefulDeadConcerts();
        }       
    }
    public static void testDocument_GratefulDeadConcerts() {
        ODatabaseDocumentTx db = new ODatabaseDocumentTx("remote:localhost/GratefulDeadConcerts").open( "admin", "admin" );
        String className = "V";
        int clid = db.getClusterIdByName( "V" );
        System.out.println( "clusterid = " + clid );
        
        int nrOfRecords = 0;
        long t = System.nanoTime();
        long totalFetchTime = 0;
        long totalToJSONTime = 0;
        try {
            for ( ODocument record : db.browseCluster( className ) ) {
                nrOfRecords++;
                
                long currentNanoTime = System.nanoTime();
                totalFetchTime += currentNanoTime - t;
                
                t = System.nanoTime();
                String json = record.toJSON();
                currentNanoTime = System.nanoTime();
                totalToJSONTime += currentNanoTime - t;
                
                t = System.nanoTime();
            }
            double totalFetchTimeInMilliseconds = ((double)totalFetchTime / 1000000);
            double totalToJSONTimeInMilliseconds = ((double)totalToJSONTime / 1000000);
            System.out.println( "Fetching and toJSON() on " + nrOfRecords + " records took " + totalFetchTimeInMilliseconds + " milliseconds and " + totalToJSONTimeInMilliseconds + " milliseconds respectively." );
            System.out.println( "That is " + (totalFetchTimeInMilliseconds / nrOfRecords) + " milliseconds and " + ( totalToJSONTimeInMilliseconds / nrOfRecords ) + " milliseconds per record" +
                                "\n" +
                                " and " + ( 1000.0 / (totalFetchTimeInMilliseconds / nrOfRecords) ) + " records per second" +
                                " and " + (1000.0 / (totalToJSONTimeInMilliseconds / nrOfRecords)) + " records per second respectively." );
            db.rollback();
        }
        finally {
            System.out.println( "Number of records = " + nrOfRecords );
            db.close();
        }
    }
    public static long printDurationToSystemOut(long startSystemNanoTime, long currentSystemNanoTime, String description) {
        long durationInNanoSeconds = currentSystemNanoTime - startSystemNanoTime;
        double durationInMilliseconds = (double)durationInNanoSeconds / 1000000;
        System.out.println(description + " took " + durationInMilliseconds + " ms.");
        return durationInNanoSeconds;
    }

gives me the following disappointing final results on the first run

Fetching and toJSON() on 809 records took 544.10086 milliseconds and 1318.839576 milliseconds respectively.
That is 0.6725597775030903 milliseconds and 1.630209611866502 milliseconds per record 
    and 1486.8566831524581 records per second and 613.4180492624222 records per second respectively.

which is slow, an I don't understand because the server keeps running so I would presume the 'cache' should still be hot (or is it completely thrown away on each 'session'?) and could that be the reason why everything on the Restful API stays so slow, because it is like a whole new 'cold' session each time?

On the last run, I get closer to the 5 000 reords per second that laa is talking about:

Fetching and toJSON() on 809 records took 143.26815 milliseconds and 175.4015 milliseconds respectively.
That is 0.17709289245982693 milliseconds and 0.21681273176761434 milliseconds per record
 and 5646.754006385928 records per second and 4612.275265604912 records per second respectively.

but mind you that I am not even doing a query, just selecting ALL records in the database.
Also toJSON() seems very slow, if you see that it can only stringify < 5000 records per second, and considering stringifying a larger object in nodejs can do 175438 javascript objects per second. I can imagine it will be slower, but 35 times slower seems like way too much, so I guess some performance improvements could and should be made there.

And the following are just some simple JSON parsing and stringifying tests on nodejs, just to get an idea about what these figures mean...

Parsing 10000 JSON objects took  244 milliseconds. That's 40983.60655737705 per second, and 0.0244 ms per event.
Stringifying 10000 JSON objects took  57 milliseconds. That's 175438.59649122806 per second, and 0.0057 ms per event.

phpnode · 2014-11-13T19:50:31Z

@mrft curious about this because Oriento has some benchmarks as part of its test suite and I've never seen such low numbers, I wonder if this is because we're using RECORD_LOAD and thus avoid all the SQL overhead. Would you mind running the Oriento tests and pasting the results of this one in particular: https://github.com/codemix/oriento/blob/master/test/bugs/27-slow.js ?

andrii0lomakin · 2014-11-13T21:01:43Z

@mrft you load all records by network on client side to do query and claim that this is slow ? Simple does not mean efficient.

All queries are done on server side not on client side.
Also System.nanoTime has:

granularity which you did not take in account when you wrote this benchmark.
System.nanoTime is slow itself it took about 100 CPU cycles to execute it.

https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks

What I propose. I will write benchmark and you will review it. Then you will run benchmark.
And then we will decide whether this issue should be fixed or not.

BTW Which H/W did you use to run this your test ?

mrft · 2014-11-13T23:22:20Z

@phpnode:
I guess these are the results you are looking for:

  Bug #27: Slow compared to Restful API
Binary Protocol Took  556ms, 8993 documents per second
    ✓ should load a lot of records quickly, using the binary raw command interface (557ms)
Rest Protocol Took  593ms, 8432 documents per second
    ✓ should load a lot of records quickly, using the rest raw command interface (593ms)
Binary DB Api Took  292ms, 17123 documents per second
    ✓ should load a lot of records quickly (292ms)
Binary Record Load Took  985ms, 5076 documents per second
    ✓ should load a lot of records, one at a time, using binary (986ms)
Rest Record Load Took  7575ms, 660 documents per second
    ✓ should load a lot of records, one at a time, using rest (7575ms)

@Laa:

Yes, all records, which means it should be able to start sending results right away since there cannot be any "I have to do difficult logic to find out which records you mean".
I am talking against my local machine (core i5 running on Ubuntu 14.04 with 8MB of RAM and an SSD), but even when assuming a 1000Mb/second link we should be able to send more than 100MB/second, so that shouldn't be an issue either.
I am measuring the time it takes to fetch 1 single record (and then adding all these times to get an average, so I am not measuring the time of the nanoTime call or any other code) after the query has been executed, and the time it takes to JSON.stringify it. I didn't measure how long it took to run the query.
If I compare this to doing a select * from table limit 1000 on a local postgres database I get the following results (using Squirrel)
```
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.088, SQL query: 0.005, Reading results: 0.083
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.067, SQL query: 0.009, Reading results: 0.058
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.041, SQL query: 0.004, Reading results: 0.037
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.032, SQL query: 0.005, Reading results: 0.027
Query 1 of 1, Rows read: 1,000, Elapsed time (seconds) - Total: 0.028, SQL query: 0.003, Reading results: 0.025
```
and if I compare this to 0.544 or later on 0.143 for fetching about 800 records from OrientDB, I can only conclude that it is way slower. About 5 times worse, that is very signifcant. All of this was done on the same laptop, with no phyical network between client and server.
When I disconnect from Postgres and I reconnect, the first query I run immediately gives me results like below, and I don't have to wait again for the cache to warm up
```
Rows read: 1,000, Elapsed time (seconds) - Total: 0.03, SQL query: 0.004, Reading results: 0.026
```
Since I was trying to measure how long it took to execute something, maybe you should take a look at what I quoted below from https://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime%28%29
```
For example, to measure how long some code takes to execute:
 long startTime = System.nanoTime();
// ... the code being measured ...
long estimatedTime = System.nanoTime() - startTime;
```
Oh, and a 2 GHz CPU (two gigahertz) goes through 2,000,000,000 cycles every second, and you are worrying about 100 CPU cycles disrupting my measurements?

What more can I say?
It feels like you don't want to believe these numbers, because you don't like what I am saying.

You can try to write a benchmark that will prove that my numbers are wrong, or you could use your time wisely and make the restful API, ànd the console, ànd the Java API all work faster, and check if you are on par with other modern day database systems.

andrii0lomakin · 2014-11-14T06:00:05Z

@mrft I have doubts about benchmarks because you are not only one user who compares DBs performance. Could you wait a bit I finish important issue today and we try to discover what is going on.

phpnode · 2014-11-14T12:36:46Z

@mrft thanks, so the Oriento results don't reflect what you're seeing in java. What kind of perf do you see if you do the following in java (pseudo-code)

// prep
var num = 10000;
for (i = 0; i < num; i++) {
   db.someCluster.insert({foo: i});
}

// bench
var startTime = System.nanoTime();
for (i = 0; i < num; i++) {
  doc = db.record.load('#10:' + i);
}
var stopTime = System.nanoTime();

console.log(stopTime - startTime);

mrft · 2015-03-26T10:38:41Z

Some more test results on OrientDB version 2.0.5.

For the last query select name, $sungbyrecord as x_sungby from V let $sungby = outE()[ @Class = 'sung_by' ], $sungbyrecord = list( (select in.name from $sungby ) ) where name='HEY BO DIDDLEY', the difference between with and without indexes is incredible, which I find peculiar, since traversing edges is being advertised as have a very low cost, so you would expect this time to always (indexes or not) be very close to a simple select name from V where name='HEY BO DIDDLEY' but it takes 10 times as long without indexes, and about the same time with indexes.

Without indexes

query	# rows /query	# queries /sec	# rows /sec	ms /query	ms /row	# queries	totalTime (ms)
select * from V where song_type = 'original'	184	71.441	13145.172	13.998	0.076	1000	13997.535
select * from V where song_type = 'original' limit 100	100	165.990	16598.958	6.024	0.060	1000	6024.475
select * from V where type = 'song' limit 100	100	257.518	25751.799	3.883	0.039	1000	3883.224
select * from V where type = 'song' limit 500	500	77.580	38790.167	12.890	0.026	1000	12889.865
select name from V where type = 'song' limit 100	100	215.258	21525.838	4.646	0.046	1000	4645.580
select name from V where type = 'song' limit 500	500	57.107	28553.438	17.511	0.035	1000	17511.026
select * from V where type = 'song'	584	68.734	40140.737	14.549	0.025	1000	14548.811
select * from V where type = 'artist' limit 100	100	109.713	10971.268	9.115	0.091	1000	9114.717
select * from V where type = 'artist' limit 200	200	75.247	15049.494	13.289	0.066	1000	13289.484
select * from V where type = 'artist'	224	70.668	15829.675	14.151	0.063	1000	14150.638
select distinct( type ) from V	2	69.532	139.063	14.382	7.191	1000	14381.942
select name, $sungbyrecord as x_sungby from V let $sungby = outE()[ @Class = 'sung_by' ], $sungbyrecord = list( (select in.name from $sungby ) ) where name='HEY BO DIDDLEY'	1	7.289	7.289	137.198	137.198	1000	137198.168

With indexes (3 x notunique on name, song_type and type)

query	# rows /query	# queries /sec	# rows /sec	ms /query	ms /row	# queries	totalTime (ms)
select from V where name = 'STANDER ON THE MOUNTAIN'	1	998.787	998.787	1.001	1.001	1000	1001.215
select from V where name = 'HEY BO DIDDLEY'	1	1533.218	1533.218	0.652	0.652	1000	652.223
select * from V where song_type = 'original'	184	192.444	35409.669	5.196	0.028	1000	5196.321
select * from V where song_type = 'original' limit 100	100	267.173	26717.287	3.743	0.037	1000	3742.895
select * from V where type = 'song' limit 100	100	267.646	26764.556	3.736	0.037	1000	3736.285
select * from V where type = 'song' limit 500	500	99.432	49716.195	10.057	0.020	1000	10057.085
select name from V where type = 'song' limit 100	100	228.666	22866.639	4.373	0.044	1000	4373.183
select name from V where type = 'song' limit 500	500	66.468	33234.237	15.045	0.030	1000	15044.726
select * from V where type = 'song'	584	85.948	50193.442	11.635	0.020	1000	11634.986
select * from V where type = 'artist' limit 100	100	353.443	35344.267	2.829	0.028	1000	2829.313
select * from V where type = 'artist' limit 200	200	226.790	45358.027	4.409	0.022	1000	4409.363
select * from V where type = 'artist'	224	200.577	44929.252	4.986	0.022	1000	4985.616
select distinct( type ) from V	2	64.084	128.168	15.605	7.802	1000	15604.579
select name, $sungbyrecord as x_sungby from V let $sungby = outE()[ @Class = 'sung_by' ], $sungbyrecord = list( (select in.name from $sungby ) ) where name='HEY BO DIDDLEY'	1	1006.478	1006.478	0.994	0.994	1000	993.564

mattaylor · 2015-03-26T14:38:25Z

try without the subselect. Ie..

select name, $sungbyrecord, $sungby.in.name from V let $sungby = outE()[ @Class = 'sung_by' ] where name='HEY BO DIDDLEY'

Sent from my Verizon Wireless 4G LTE smartphone

mrft · 2015-03-26T16:55:53Z

That's a different query.

What you propose would be similar to

select name, $followedby.in.name, $followedby.in.performances
from V 
let $followedby = outE()[ @class = 'followed_by' ]
where name='HEY BO DIDDLEY'

which returns 2 lists, 1 for name, and one for performances

which is a completely different thing than

select name, $followedbyrecord as followedbylist
from V 
let $followedby = outE()[ @class = 'followed_by' ],
    $followedbyrecord = list( (select in.name as name, in.performances as performances from $followedby) )
where name='HEY BO DIDDLEY'

which returns 1 list with objects containing 2 properties.

I am not looking for alternative queries, I am trying to let the developers know where improvements could be made.

mrft · 2015-04-20T13:10:43Z

Another performance related question from someone else:
https://groups.google.com/forum/?pli=1#!topic/orient-database/LpG4kQSyqc8

andrii0lomakin · 2015-09-24T08:10:17Z

This issue is not specific , it is more like there are list of queries which are slow, there is no concrete query it is like aggregation of slow queries so I remove milestone from this issue.

andrii0lomakin · 2015-09-24T08:11:05Z

Also problem lies not in embedded version but n remote version so I assign @tglman to this issue.

andrii0lomakin · 2020-06-12T11:35:03Z

Hi, if this issue still actual please let us know otherwise we will close it.

lvca closed this as completed Nov 6, 2014

lvca added wontfix invalid labels Nov 6, 2014

lvca self-assigned this Nov 6, 2014

andrii0lomakin assigned andrii0lomakin and unassigned lvca Nov 9, 2014

andrii0lomakin reopened this Nov 9, 2014

andrii0lomakin removed invalid wontfix labels Nov 12, 2014

andrii0lomakin added this to the 2.0 Final milestone Nov 12, 2014

andrii0lomakin added the In Progress label Nov 14, 2014

lvca assigned luigidellaquila Mar 6, 2015

lvca modified the milestones: 2.1-rc1, 2.1-rc2 Apr 1, 2015

lvca modified the milestones: 2.1-rc2, 2.1 GA May 5, 2015

lvca assigned andrii0lomakin and unassigned luigidellaquila Jul 3, 2015

lvca added the performance label Jul 3, 2015

andrii0lomakin modified the milestones: 2.1 GA, 2.2 Jul 8, 2015

andrii0lomakin removed this from the 2.2 milestone Sep 24, 2015

andrii0lomakin added storage team and removed storage team labels Apr 12, 2016

natemurthy mentioned this issue May 9, 2016

Read performance of SELECT query is very slow when retrieving vertices from remote database (OrientDB 2.1.4) #6109

Closed

andrii0lomakin removed the storage team label Sep 30, 2019

andrii0lomakin added the out-of-dated label Jun 12, 2020

andrii0lomakin closed this as completed Jul 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of select query using index is extremely low #3041

Performance of select query using index is extremely low #3041

mrft commented Nov 6, 2014

lvca commented Nov 6, 2014

andrii0lomakin commented Nov 6, 2014

andrii0lomakin commented Nov 6, 2014

mrft commented Nov 6, 2014

lvca commented Nov 6, 2014

mrft commented Nov 7, 2014

lvca commented Nov 7, 2014

lvca commented Nov 7, 2014

mrft commented Nov 7, 2014

lvca commented Nov 7, 2014

mrft commented Nov 7, 2014

andrii0lomakin commented Nov 9, 2014

andrii0lomakin commented Nov 9, 2014

mrft commented Nov 13, 2014

phpnode commented Nov 13, 2014

andrii0lomakin commented Nov 13, 2014

mrft commented Nov 13, 2014

andrii0lomakin commented Nov 14, 2014

phpnode commented Nov 14, 2014

mrft commented Mar 26, 2015

mattaylor commented Mar 26, 2015

mrft commented Mar 26, 2015

mrft commented Apr 20, 2015

andrii0lomakin commented Sep 24, 2015

andrii0lomakin commented Sep 24, 2015

andrii0lomakin commented Jun 12, 2020

Performance of select query using index is extremely low #3041

Performance of select query using index is extremely low #3041

Comments

mrft commented Nov 6, 2014

lvca commented Nov 6, 2014

andrii0lomakin commented Nov 6, 2014

andrii0lomakin commented Nov 6, 2014

mrft commented Nov 6, 2014

lvca commented Nov 6, 2014

mrft commented Nov 7, 2014

lvca commented Nov 7, 2014

lvca commented Nov 7, 2014

mrft commented Nov 7, 2014

lvca commented Nov 7, 2014

mrft commented Nov 7, 2014

andrii0lomakin commented Nov 9, 2014

andrii0lomakin commented Nov 9, 2014

mrft commented Nov 13, 2014

phpnode commented Nov 13, 2014

andrii0lomakin commented Nov 13, 2014

mrft commented Nov 13, 2014

andrii0lomakin commented Nov 14, 2014

phpnode commented Nov 14, 2014

mrft commented Mar 26, 2015

Without indexes

With indexes (3 x notunique on name, song_type and type)

mattaylor commented Mar 26, 2015

mrft commented Mar 26, 2015

mrft commented Apr 20, 2015

andrii0lomakin commented Sep 24, 2015

andrii0lomakin commented Sep 24, 2015

andrii0lomakin commented Jun 12, 2020