-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of select query using index is extremely low #3041
Comments
Is this a benchmark? Have you tried with 1 or 100 records? |
How many records were into the index ? |
Did you make cache hot before benchmark ? |
It is just the GratefulDeadConcerts database. Doing the same query a few times from the console seems faster: But still 5 milliseconds for 1 single record is extremely slow in my opinion, don't you think? Here is some tests using a node script that calls the restful API and uses Oriento: 1000 X query [select from V where name = 'HEY BO DIDDLEY'] through REST took 8354 milliseconds. That's 119.70313622216902 per second, and 8.354 ms per event. Start 1000 queries with BINARY protocol and query = select from V where name = 'HEY BO DIDDLEY' 1000 X query [select from V where name = 'HEY BO DIDDLEY'] through BINARY took 3222 milliseconds. That's 310.36623215394167 per second, and 3.222 ms per event. |
Can you try to get 1 and 50 records? I bet the time is pretty close. If you get 1 record you've a fixed cost in time. |
WONTFIX + INVALID ? Fixed cost in time? Some more results from SNAPSHOT-2.0 from last tuesday I think... 1000 X query [select from V where name <> 'HEY BO DIDDLEY' limit 50] through REST took 205901 milliseconds. That's 4.856702978615937 per second, and 205.901 ms per event. 1000 X query [select from V where name <> 'HEY BO DIDDLEY' limit 50] through BINARY took 216148 milliseconds. That's 4.62645964801895 per second, and 216.148 ms per event. 1000 X query [select from V where name in [ 'HEY BO DIDDLEY','IM A MAN','NOT FADE AWAY','BERTHA','GOING DOWN THE ROAD FEELING BAD','MONA','Bo_Diddley','Garcia','Spencer_Davis' ]] through REST took 51524 milliseconds. That's 19.408431022436147 per second, and 51.524 ms per event. And can it get any more basic than this? 1000 X query [select from V limit 50] through REST took 191517 milliseconds. That's 5.221468590255696 per second, and 191.517 ms per event. |
@mrft I closed the issue because you didn't respond at my first question about executing the query with 1 and 50 returning items. |
@mrft In your last comment you don't write how you tested REST and Binary protocols. Can you execute just 2 queries through console or CURL that retrieve 1 and then 50 items? |
Using console connect remote:localhost/GratefulDeadConcerts admin admin select from V limit 50 (I did it > 5 times to let the memory caching kick in): 50 item(s) found. Query executed in 0.124 sec(s). |
Ok, so it's 0,00248 per record. About the index, have you used a UNIQUE_HASH_INDEX? |
NOTUNIQUE as you can see in my first post (since not all V in GratefulDeadConcerts have a name I think so you have multiple NULL values...) 2.48 ms per record, sounds ok if you put it like that, but it should be 2.48 ms for the 50 small records (or maybe 5ms or even 10ms but not 124ms). Try a similar query on MySQL on Postgres, and you'll know what I mean. Do you knwo what is causing this? ToJSON couldn't take that long... |
Guys it is not even 2 ms per record we can get about 5 000 records per second which is 0.2 ms per |
Ok so for one test I got 100 rec/ms . 0.01 ms for record. I will check it on this database. |
Some more tests, this time I wrote a very simple java application, and started fetching all the records public static void main( String[ ] args) { for ( int i = 0; i < 50; i++ ) { testDocument_GratefulDeadConcerts(); } } public static void testDocument_GratefulDeadConcerts() { ODatabaseDocumentTx db = new ODatabaseDocumentTx("remote:localhost/GratefulDeadConcerts").open( "admin", "admin" ); String className = "V"; int clid = db.getClusterIdByName( "V" ); System.out.println( "clusterid = " + clid ); int nrOfRecords = 0; long t = System.nanoTime(); long totalFetchTime = 0; long totalToJSONTime = 0; try { for ( ODocument record : db.browseCluster( className ) ) { nrOfRecords++; long currentNanoTime = System.nanoTime(); totalFetchTime += currentNanoTime - t; t = System.nanoTime(); String json = record.toJSON(); currentNanoTime = System.nanoTime(); totalToJSONTime += currentNanoTime - t; t = System.nanoTime(); } double totalFetchTimeInMilliseconds = ((double)totalFetchTime / 1000000); double totalToJSONTimeInMilliseconds = ((double)totalToJSONTime / 1000000); System.out.println( "Fetching and toJSON() on " + nrOfRecords + " records took " + totalFetchTimeInMilliseconds + " milliseconds and " + totalToJSONTimeInMilliseconds + " milliseconds respectively." ); System.out.println( "That is " + (totalFetchTimeInMilliseconds / nrOfRecords) + " milliseconds and " + ( totalToJSONTimeInMilliseconds / nrOfRecords ) + " milliseconds per record" + "\n" + " and " + ( 1000.0 / (totalFetchTimeInMilliseconds / nrOfRecords) ) + " records per second" + " and " + (1000.0 / (totalToJSONTimeInMilliseconds / nrOfRecords)) + " records per second respectively." ); db.rollback(); } finally { System.out.println( "Number of records = " + nrOfRecords ); db.close(); } } public static long printDurationToSystemOut(long startSystemNanoTime, long currentSystemNanoTime, String description) { long durationInNanoSeconds = currentSystemNanoTime - startSystemNanoTime; double durationInMilliseconds = (double)durationInNanoSeconds / 1000000; System.out.println(description + " took " + durationInMilliseconds + " ms."); return durationInNanoSeconds; } gives me the following disappointing final results on the first run Fetching and toJSON() on 809 records took 544.10086 milliseconds and 1318.839576 milliseconds respectively. That is 0.6725597775030903 milliseconds and 1.630209611866502 milliseconds per record and 1486.8566831524581 records per second and 613.4180492624222 records per second respectively. which is slow, an I don't understand because the server keeps running so I would presume the 'cache' should still be hot (or is it completely thrown away on each 'session'?) and could that be the reason why everything on the Restful API stays so slow, because it is like a whole new 'cold' session each time? On the last run, I get closer to the 5 000 reords per second that laa is talking about: Fetching and toJSON() on 809 records took 143.26815 milliseconds and 175.4015 milliseconds respectively. That is 0.17709289245982693 milliseconds and 0.21681273176761434 milliseconds per record and 5646.754006385928 records per second and 4612.275265604912 records per second respectively. but mind you that I am not even doing a query, just selecting ALL records in the database. And the following are just some simple JSON parsing and stringifying tests on nodejs, just to get an idea about what these figures mean... Parsing 10000 JSON objects took 244 milliseconds. That's 40983.60655737705 per second, and 0.0244 ms per event. Stringifying 10000 JSON objects took 57 milliseconds. That's 175438.59649122806 per second, and 0.0057 ms per event. |
@mrft curious about this because Oriento has some benchmarks as part of its test suite and I've never seen such low numbers, I wonder if this is because we're using RECORD_LOAD and thus avoid all the SQL overhead. Would you mind running the Oriento tests and pasting the results of this one in particular: https://github.com/codemix/oriento/blob/master/test/bugs/27-slow.js ? |
@mrft you load all records by network on client side to do query and claim that this is slow ? Simple does not mean efficient. All queries are done on server side not on client side.
https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks What I propose. I will write benchmark and you will review it. Then you will run benchmark. BTW Which H/W did you use to run this your test ? |
@phpnode: Bug #27: Slow compared to Restful API Binary Protocol Took 556ms, 8993 documents per second ✓ should load a lot of records quickly, using the binary raw command interface (557ms) Rest Protocol Took 593ms, 8432 documents per second ✓ should load a lot of records quickly, using the rest raw command interface (593ms) Binary DB Api Took 292ms, 17123 documents per second ✓ should load a lot of records quickly (292ms) Binary Record Load Took 985ms, 5076 documents per second ✓ should load a lot of records, one at a time, using binary (986ms) Rest Record Load Took 7575ms, 660 documents per second ✓ should load a lot of records, one at a time, using rest (7575ms) @Laa:
What more can I say? You can try to write a benchmark that will prove that my numbers are wrong, or you could use your time wisely and make the restful API, ànd the console, ànd the Java API all work faster, and check if you are on par with other modern day database systems. |
@mrft I have doubts about benchmarks because you are not only one user who compares DBs performance. Could you wait a bit I finish important issue today and we try to discover what is going on. |
@mrft thanks, so the Oriento results don't reflect what you're seeing in java. What kind of perf do you see if you do the following in java (pseudo-code)
|
Some more test results on OrientDB version 2.0.5. For the last query select name, $sungbyrecord as x_sungby from V let $sungby = outE()[ @Class = 'sung_by' ], $sungbyrecord = list( (select in.name from $sungby ) ) where name='HEY BO DIDDLEY', the difference between with and without indexes is incredible, which I find peculiar, since traversing edges is being advertised as have a very low cost, so you would expect this time to always (indexes or not) be very close to a simple select name from V where name='HEY BO DIDDLEY' but it takes 10 times as long without indexes, and about the same time with indexes. Without indexes
With indexes (3 x notunique on name, song_type and type)
|
try without the subselect. Ie.. select name, $sungbyrecord, $sungby.in.name from V let $sungby = outE()[ @Class = 'sung_by' ] where name='HEY BO DIDDLEY' Sent from my Verizon Wireless 4G LTE smartphone |
That's a different query. What you propose would be similar to select name, $followedby.in.name, $followedby.in.performances from V let $followedby = outE()[ @class = 'followed_by' ] where name='HEY BO DIDDLEY' which returns 2 lists, 1 for name, and one for performances which is a completely different thing than select name, $followedbyrecord as followedbylist from V let $followedby = outE()[ @class = 'followed_by' ], $followedbyrecord = list( (select in.name as name, in.performances as performances from $followedby) ) where name='HEY BO DIDDLEY' which returns 1 list with objects containing 2 properties. I am not looking for alternative queries, I am trying to let the developers know where improvements could be made. |
Another performance related question from someone else: |
This issue is not specific , it is more like there are list of queries which are slow, there is no concrete query it is like aggregation of slow queries so I remove milestone from this issue. |
Also problem lies not in embedded version but n remote version so I assign @tglman to this issue. |
Hi, if this issue still actual please let us know otherwise we will close it. |
=> Query executed in 0.069 sec. Returned 1 record(s)
returns
which indicates the index is being used.
69 milliseconds, for fetching 1 single record, you could beat that performnce by a serial search. This is a severe contrast with OrientDB boasting about its speed on the website...
Oh, and I am running OrientDB locally (so no network latency), and ran this query through the studio (but that doesn't really matter). Version 1.7.9 or 2.0-M2 doesn't matter also
The text was updated successfully, but these errors were encountered: