Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parquet-index binary #3405

Merged
merged 2 commits into from
Jan 2, 2023
Merged

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

I recently found myself wanting to debug filter pushdown for a specific parquet file, this file had a page index but did not store statistics inline within the pages (as is perfectly valid). Unfortunately this meant that parquet-tools dump fails to print the statistics, which made debugging this issue complicated.

What changes are included in this PR?

Adds a parquet-index binary which can be used to dump the page index for a given column of a given file. Longer-term I wonder if we should merge some of these tools together into a single binary, I may write up a ticket.

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 28, 2022
@tustvold
Copy link
Contributor Author

Example output Running `target/debug/parquet-index /home/raphael/repos/external/arrow-datafusion/parquet-testing/data/alltypes_tiny_pages.parquet month` Row Group: 0 Page 0 at offset 0x0004ceb6 with length 25 and row count 21, min 1, max 1 Page 1 at offset 0x0004cecf with length 25 and row count 21, min 1, max 1 Page 2 at offset 0x0004cee8 with length 25 and row count 21, min 1, max 1 Page 3 at offset 0x0004cf01 with length 25 and row count 21, min 1, max 1 Page 4 at offset 0x0004cf1a with length 25 and row count 27, min 1, max 1 Page 5 at offset 0x0004cf33 with length 25 and row count 21, min 1, max 1 Page 6 at offset 0x0004cf4c with length 25 and row count 21, min 1, max 1 Page 7 at offset 0x0004cf65 with length 25 and row count 21, min 1, max 1 Page 8 at offset 0x0004cf7e with length 25 and row count 27, min 1, max 1 Page 9 at offset 0x0004cf97 with length 25 and row count 21, min 1, max 1 Page 10 at offset 0x0004cfb0 with length 25 and row count 21, min 1, max 1 Page 11 at offset 0x0004cfc9 with length 25 and row count 21, min 1, max 1 Page 12 at offset 0x0004cfe2 with length 25 and row count 27, min 1, max 1 Page 13 at offset 0x0004cffb with length 28 and row count 21, min 1, max 10 Page 14 at offset 0x0004d017 with length 26 and row count 21, min 10, max 10 Page 15 at offset 0x0004d031 with length 26 and row count 21, min 10, max 10 Page 16 at offset 0x0004d04b with length 26 and row count 27, min 10, max 10 Page 17 at offset 0x0004d065 with length 26 and row count 21, min 10, max 10 Page 18 at offset 0x0004d07f with length 26 and row count 21, min 10, max 10 Page 19 at offset 0x0004d099 with length 26 and row count 21, min 10, max 10 Page 20 at offset 0x0004d0b3 with length 26 and row count 27, min 10, max 10 Page 21 at offset 0x0004d0cd with length 26 and row count 21, min 10, max 10 Page 22 at offset 0x0004d0e7 with length 26 and row count 21, min 10, max 10 Page 23 at offset 0x0004d101 with length 26 and row count 21, min 10, max 10 Page 24 at offset 0x0004d11b with length 26 and row count 27, min 10, max 10 Page 25 at offset 0x0004d135 with length 26 and row count 21, min 10, max 10 Page 26 at offset 0x0004d14f with length 26 and row count 21, min 10, max 10 Page 27 at offset 0x0004d169 with length 29 and row count 21, min 10, max 11 Page 28 at offset 0x0004d186 with length 26 and row count 27, min 11, max 11 Page 29 at offset 0x0004d1a0 with length 26 and row count 21, min 11, max 11 Page 30 at offset 0x0004d1ba with length 26 and row count 21, min 11, max 11 Page 31 at offset 0x0004d1d4 with length 26 and row count 21, min 11, max 11 Page 32 at offset 0x0004d1ee with length 26 and row count 27, min 11, max 11 Page 33 at offset 0x0004d208 with length 26 and row count 21, min 11, max 11 Page 34 at offset 0x0004d222 with length 26 and row count 21, min 11, max 11 Page 35 at offset 0x0004d23c with length 26 and row count 21, min 11, max 11 Page 36 at offset 0x0004d256 with length 26 and row count 27, min 11, max 11 Page 37 at offset 0x0004d270 with length 26 and row count 21, min 11, max 11 Page 38 at offset 0x0004d28a with length 26 and row count 21, min 11, max 11 Page 39 at offset 0x0004d2a4 with length 26 and row count 21, min 11, max 11 Page 40 at offset 0x0004d2be with length 29 and row count 27, min 11, max 12 Page 41 at offset 0x0004d2db with length 26 and row count 21, min 12, max 12 Page 42 at offset 0x0004d2f5 with length 26 and row count 21, min 12, max 12 Page 43 at offset 0x0004d30f with length 26 and row count 21, min 12, max 12 Page 44 at offset 0x0004d329 with length 26 and row count 27, min 12, max 12 Page 45 at offset 0x0004d343 with length 26 and row count 21, min 12, max 12 Page 46 at offset 0x0004d35d with length 26 and row count 21, min 12, max 12 Page 47 at offset 0x0004d377 with length 26 and row count 21, min 12, max 12 Page 48 at offset 0x0004d391 with length 26 and row count 27, min 12, max 12 Page 49 at offset 0x0004d3ab with length 26 and row count 21, min 12, max 12 Page 50 at offset 0x0004d3c5 with length 26 and row count 21, min 12, max 12 Page 51 at offset 0x0004d3df with length 26 and row count 21, min 12, max 12 Page 52 at offset 0x0004d3f9 with length 26 and row count 27, min 12, max 12 Page 53 at offset 0x0004d413 with length 26 and row count 21, min 12, max 12 Page 54 at offset 0x0004d42d with length 30 and row count 21, min 2, max 12 Page 55 at offset 0x0004d44b with length 26 and row count 21, min 2, max 2 Page 56 at offset 0x0004d465 with length 26 and row count 27, min 2, max 2 Page 57 at offset 0x0004d47f with length 26 and row count 21, min 2, max 2 Page 58 at offset 0x0004d499 with length 26 and row count 21, min 2, max 2 Page 59 at offset 0x0004d4b3 with length 26 and row count 21, min 2, max 2 Page 60 at offset 0x0004d4cd with length 26 and row count 27, min 2, max 2 Page 61 at offset 0x0004d4e7 with length 26 and row count 21, min 2, max 2 Page 62 at offset 0x0004d501 with length 26 and row count 21, min 2, max 2 Page 63 at offset 0x0004d51b with length 26 and row count 21, min 2, max 2 Page 64 at offset 0x0004d535 with length 26 and row count 27, min 2, max 2 Page 65 at offset 0x0004d54f with length 26 and row count 21, min 2, max 2 Page 66 at offset 0x0004d569 with length 26 and row count 21, min 2, max 2 Page 67 at offset 0x0004d583 with length 30 and row count 21, min 2, max 3 Page 68 at offset 0x0004d5a1 with length 26 and row count 27, min 3, max 3 Page 69 at offset 0x0004d5bb with length 26 and row count 21, min 3, max 3 Page 70 at offset 0x0004d5d5 with length 26 and row count 21, min 3, max 3 Page 71 at offset 0x0004d5ef with length 26 and row count 21, min 3, max 3 Page 72 at offset 0x0004d609 with length 26 and row count 27, min 3, max 3 Page 73 at offset 0x0004d623 with length 26 and row count 21, min 3, max 3 Page 74 at offset 0x0004d63d with length 26 and row count 21, min 3, max 3 Page 75 at offset 0x0004d657 with length 26 and row count 21, min 3, max 3 Page 76 at offset 0x0004d671 with length 26 and row count 27, min 3, max 3 Page 77 at offset 0x0004d68b with length 26 and row count 21, min 3, max 3 Page 78 at offset 0x0004d6a5 with length 26 and row count 21, min 3, max 3 Page 79 at offset 0x0004d6bf with length 26 and row count 21, min 3, max 3 Page 80 at offset 0x0004d6d9 with length 30 and row count 27, min 3, max 4 Page 81 at offset 0x0004d6f7 with length 26 and row count 21, min 4, max 4 Page 82 at offset 0x0004d711 with length 26 and row count 21, min 4, max 4 Page 83 at offset 0x0004d72b with length 26 and row count 21, min 4, max 4 Page 84 at offset 0x0004d745 with length 26 and row count 27, min 4, max 4 Page 85 at offset 0x0004d75f with length 26 and row count 21, min 4, max 4 Page 86 at offset 0x0004d779 with length 26 and row count 21, min 4, max 4 Page 87 at offset 0x0004d793 with length 26 and row count 21, min 4, max 4 Page 88 at offset 0x0004d7ad with length 26 and row count 27, min 4, max 4 Page 89 at offset 0x0004d7c7 with length 26 and row count 21, min 4, max 4 Page 90 at offset 0x0004d7e1 with length 26 and row count 21, min 4, max 4 Page 91 at offset 0x0004d7fb with length 26 and row count 21, min 4, max 4 Page 92 at offset 0x0004d815 with length 26 and row count 27, min 4, max 4 Page 93 at offset 0x0004d82f with length 26 and row count 21, min 4, max 4 Page 94 at offset 0x0004d849 with length 28 and row count 21, min 4, max 5 Page 95 at offset 0x0004d865 with length 26 and row count 21, min 5, max 5 Page 96 at offset 0x0004d87f with length 26 and row count 27, min 5, max 5 Page 97 at offset 0x0004d899 with length 26 and row count 21, min 5, max 5 Page 98 at offset 0x0004d8b3 with length 26 and row count 21, min 5, max 5 Page 99 at offset 0x0004d8cd with length 26 and row count 21, min 5, max 5 Page 100 at offset 0x0004d8e7 with length 26 and row count 27, min 5, max 5 Page 101 at offset 0x0004d901 with length 26 and row count 21, min 5, max 5 Page 102 at offset 0x0004d91b with length 26 and row count 21, min 5, max 5 Page 103 at offset 0x0004d935 with length 26 and row count 21, min 5, max 5 Page 104 at offset 0x0004d94f with length 26 and row count 27, min 5, max 5 Page 105 at offset 0x0004d969 with length 26 and row count 21, min 5, max 5 Page 106 at offset 0x0004d983 with length 26 and row count 21, min 5, max 5 Page 107 at offset 0x0004d99d with length 26 and row count 21, min 5, max 5 Page 108 at offset 0x0004d9b7 with length 31 and row count 27, min 5, max 6 Page 109 at offset 0x0004d9d6 with length 26 and row count 21, min 6, max 6 Page 110 at offset 0x0004d9f0 with length 26 and row count 21, min 6, max 6 Page 111 at offset 0x0004da0a with length 26 and row count 21, min 6, max 6 Page 112 at offset 0x0004da24 with length 26 and row count 27, min 6, max 6 Page 113 at offset 0x0004da3e with length 26 and row count 21, min 6, max 6 Page 114 at offset 0x0004da58 with length 26 and row count 21, min 6, max 6 Page 115 at offset 0x0004da72 with length 26 and row count 21, min 6, max 6 Page 116 at offset 0x0004da8c with length 26 and row count 27, min 6, max 6 Page 117 at offset 0x0004daa6 with length 26 and row count 21, min 6, max 6 Page 118 at offset 0x0004dac0 with length 26 and row count 21, min 6, max 6 Page 119 at offset 0x0004dada with length 26 and row count 21, min 6, max 6 Page 120 at offset 0x0004daf4 with length 26 and row count 27, min 6, max 6 Page 121 at offset 0x0004db0e with length 28 and row count 21, min 6, max 7 Page 122 at offset 0x0004db2a with length 26 and row count 21, min 7, max 7 Page 123 at offset 0x0004db44 with length 26 and row count 21, min 7, max 7 Page 124 at offset 0x0004db5e with length 26 and row count 27, min 7, max 7 Page 125 at offset 0x0004db78 with length 26 and row count 21, min 7, max 7 Page 126 at offset 0x0004db92 with length 26 and row count 21, min 7, max 7 Page 127 at offset 0x0004dbac with length 26 and row count 21, min 7, max 7 Page 128 at offset 0x0004dbc6 with length 26 and row count 27, min 7, max 7 Page 129 at offset 0x0004dbe0 with length 26 and row count 21, min 7, max 7 Page 130 at offset 0x0004dbfa with length 26 and row count 21, min 7, max 7 Page 131 at offset 0x0004dc14 with length 26 and row count 21, min 7, max 7 Page 132 at offset 0x0004dc2e with length 26 and row count 27, min 7, max 7 Page 133 at offset 0x0004dc48 with length 26 and row count 21, min 7, max 7 Page 134 at offset 0x0004dc62 with length 26 and row count 21, min 7, max 7 Page 135 at offset 0x0004dc7c with length 31 and row count 21, min 7, max 8 Page 136 at offset 0x0004dc9b with length 26 and row count 27, min 8, max 8 Page 137 at offset 0x0004dcb5 with length 26 and row count 21, min 8, max 8 Page 138 at offset 0x0004dccf with length 26 and row count 21, min 8, max 8 Page 139 at offset 0x0004dce9 with length 26 and row count 21, min 8, max 8 Page 140 at offset 0x0004dd03 with length 26 and row count 27, min 8, max 8 Page 141 at offset 0x0004dd1d with length 26 and row count 21, min 8, max 8 Page 142 at offset 0x0004dd37 with length 26 and row count 21, min 8, max 8 Page 143 at offset 0x0004dd51 with length 26 and row count 21, min 8, max 8 Page 144 at offset 0x0004dd6b with length 26 and row count 27, min 8, max 8 Page 145 at offset 0x0004dd85 with length 26 and row count 21, min 8, max 8 Page 146 at offset 0x0004dd9f with length 26 and row count 21, min 8, max 8 Page 147 at offset 0x0004ddb9 with length 26 and row count 21, min 8, max 8 Page 148 at offset 0x0004ddd3 with length 31 and row count 27, min 8, max 9 Page 149 at offset 0x0004ddf2 with length 26 and row count 21, min 9, max 9 Page 150 at offset 0x0004de0c with length 26 and row count 21, min 9, max 9 Page 151 at offset 0x0004de26 with length 26 and row count 21, min 9, max 9 Page 152 at offset 0x0004de40 with length 26 and row count 27, min 9, max 9 Page 153 at offset 0x0004de5a with length 26 and row count 21, min 9, max 9 Page 154 at offset 0x0004de74 with length 26 and row count 21, min 9, max 9 Page 155 at offset 0x0004de8e with length 26 and row count 21, min 9, max 9 Page 156 at offset 0x0004dea8 with length 26 and row count 27, min 9, max 9 Page 157 at offset 0x0004dec2 with length 26 and row count 21, min 9, max 9 Page 158 at offset 0x0004dedc with length 26 and row count 21, min 9, max 9 Page 159 at offset 0x0004def6 with length 26 and row count 21, min 9, max 9 Page 160 at offset 0x0004df10 with length 26 and row count 27, min 9, max 9 Page 161 at offset 0x0004df2a with length 26 and row count 21, min 9, max 9 Page 162 at offset 0x0004df44 with length 28 and row count 21, min 1, max 9 Page 163 at offset 0x0004df60 with length 26 and row count 21, min 1, max 1 Page 164 at offset 0x0004df7a with length 26 and row count 27, min 1, max 1 Page 165 at offset 0x0004df94 with length 26 and row count 21, min 1, max 1 Page 166 at offset 0x0004dfae with length 26 and row count 21, min 1, max 1 Page 167 at offset 0x0004dfc8 with length 26 and row count 21, min 1, max 1 Page 168 at offset 0x0004dfe2 with length 26 and row count 27, min 1, max 1 Page 169 at offset 0x0004dffc with length 26 and row count 21, min 1, max 1 Page 170 at offset 0x0004e016 with length 26 and row count 21, min 1, max 1 Page 171 at offset 0x0004e030 with length 26 and row count 21, min 1, max 1 Page 172 at offset 0x0004e04a with length 26 and row count 27, min 1, max 1 Page 173 at offset 0x0004e064 with length 26 and row count 21, min 1, max 1 Page 174 at offset 0x0004e07e with length 26 and row count 21, min 1, max 1 Page 175 at offset 0x0004e098 with length 26 and row count 21, min 1, max 1 Page 176 at offset 0x0004e0b2 with length 31 and row count 27, min 1, max 10 Page 177 at offset 0x0004e0d1 with length 26 and row count 21, min 10, max 10 Page 178 at offset 0x0004e0eb with length 26 and row count 21, min 10, max 10 Page 179 at offset 0x0004e105 with length 26 and row count 21, min 10, max 10 Page 180 at offset 0x0004e11f with length 26 and row count 27, min 10, max 10 Page 181 at offset 0x0004e139 with length 26 and row count 21, min 10, max 10 Page 182 at offset 0x0004e153 with length 26 and row count 21, min 10, max 10 Page 183 at offset 0x0004e16d with length 26 and row count 21, min 10, max 10 Page 184 at offset 0x0004e187 with length 26 and row count 27, min 10, max 10 Page 185 at offset 0x0004e1a1 with length 26 and row count 21, min 10, max 10 Page 186 at offset 0x0004e1bb with length 26 and row count 21, min 10, max 10 Page 187 at offset 0x0004e1d5 with length 26 and row count 21, min 10, max 10 Page 188 at offset 0x0004e1ef with length 26 and row count 27, min 10, max 10 Page 189 at offset 0x0004e209 with length 31 and row count 21, min 10, max 11 Page 190 at offset 0x0004e228 with length 26 and row count 21, min 11, max 11 Page 191 at offset 0x0004e242 with length 26 and row count 21, min 11, max 11 Page 192 at offset 0x0004e25c with length 26 and row count 27, min 11, max 11 Page 193 at offset 0x0004e276 with length 26 and row count 21, min 11, max 11 Page 194 at offset 0x0004e290 with length 26 and row count 21, min 11, max 11 Page 195 at offset 0x0004e2aa with length 26 and row count 21, min 11, max 11 Page 196 at offset 0x0004e2c4 with length 26 and row count 27, min 11, max 11 Page 197 at offset 0x0004e2de with length 26 and row count 21, min 11, max 11 Page 198 at offset 0x0004e2f8 with length 26 and row count 21, min 11, max 11 Page 199 at offset 0x0004e312 with length 26 and row count 21, min 11, max 11 Page 200 at offset 0x0004e32c with length 26 and row count 27, min 11, max 11 Page 201 at offset 0x0004e346 with length 26 and row count 21, min 11, max 11 Page 202 at offset 0x0004e360 with length 26 and row count 21, min 11, max 11 Page 203 at offset 0x0004e37a with length 31 and row count 21, min 11, max 12 Page 204 at offset 0x0004e399 with length 26 and row count 27, min 12, max 12 Page 205 at offset 0x0004e3b3 with length 26 and row count 21, min 12, max 12 Page 206 at offset 0x0004e3cd with length 26 and row count 21, min 12, max 12 Page 207 at offset 0x0004e3e7 with length 26 and row count 21, min 12, max 12 Page 208 at offset 0x0004e401 with length 26 and row count 27, min 12, max 12 Page 209 at offset 0x0004e41b with length 26 and row count 21, min 12, max 12 Page 210 at offset 0x0004e435 with length 26 and row count 21, min 12, max 12 Page 211 at offset 0x0004e44f with length 26 and row count 21, min 12, max 12 Page 212 at offset 0x0004e469 with length 26 and row count 27, min 12, max 12 Page 213 at offset 0x0004e483 with length 26 and row count 21, min 12, max 12 Page 214 at offset 0x0004e49d with length 26 and row count 21, min 12, max 12 Page 215 at offset 0x0004e4b7 with length 26 and row count 21, min 12, max 12 Page 216 at offset 0x0004e4d1 with length 31 and row count 27, min 2, max 12 Page 217 at offset 0x0004e4f0 with length 26 and row count 21, min 2, max 2 Page 218 at offset 0x0004e50a with length 26 and row count 21, min 2, max 2 Page 219 at offset 0x0004e524 with length 26 and row count 21, min 2, max 2 Page 220 at offset 0x0004e53e with length 26 and row count 27, min 2, max 2 Page 221 at offset 0x0004e558 with length 26 and row count 21, min 2, max 2 Page 222 at offset 0x0004e572 with length 26 and row count 21, min 2, max 2 Page 223 at offset 0x0004e58c with length 26 and row count 21, min 2, max 2 Page 224 at offset 0x0004e5a6 with length 26 and row count 27, min 2, max 2 Page 225 at offset 0x0004e5c0 with length 26 and row count 21, min 2, max 2 Page 226 at offset 0x0004e5da with length 26 and row count 21, min 2, max 2 Page 227 at offset 0x0004e5f4 with length 26 and row count 21, min 2, max 2 Page 228 at offset 0x0004e60e with length 26 and row count 27, min 2, max 2 Page 229 at offset 0x0004e628 with length 28 and row count 21, min 2, max 3 Page 230 at offset 0x0004e644 with length 26 and row count 21, min 3, max 3 Page 231 at offset 0x0004e65e with length 26 and row count 21, min 3, max 3 Page 232 at offset 0x0004e678 with length 26 and row count 27, min 3, max 3 Page 233 at offset 0x0004e692 with length 26 and row count 21, min 3, max 3 Page 234 at offset 0x0004e6ac with length 26 and row count 21, min 3, max 3 Page 235 at offset 0x0004e6c6 with length 26 and row count 21, min 3, max 3 Page 236 at offset 0x0004e6e0 with length 26 and row count 27, min 3, max 3 Page 237 at offset 0x0004e6fa with length 26 and row count 21, min 3, max 3 Page 238 at offset 0x0004e714 with length 26 and row count 21, min 3, max 3 Page 239 at offset 0x0004e72e with length 26 and row count 21, min 3, max 3 Page 240 at offset 0x0004e748 with length 26 and row count 27, min 3, max 3 Page 241 at offset 0x0004e762 with length 26 and row count 21, min 3, max 3 Page 242 at offset 0x0004e77c with length 26 and row count 21, min 3, max 3 Page 243 at offset 0x0004e796 with length 31 and row count 21, min 3, max 4 Page 244 at offset 0x0004e7b5 with length 26 and row count 27, min 4, max 4 Page 245 at offset 0x0004e7cf with length 26 and row count 21, min 4, max 4 Page 246 at offset 0x0004e7e9 with length 26 and row count 21, min 4, max 4 Page 247 at offset 0x0004e803 with length 26 and row count 21, min 4, max 4 Page 248 at offset 0x0004e81d with length 26 and row count 27, min 4, max 4 Page 249 at offset 0x0004e837 with length 26 and row count 21, min 4, max 4 Page 250 at offset 0x0004e851 with length 26 and row count 21, min 4, max 4 Page 251 at offset 0x0004e86b with length 26 and row count 21, min 4, max 4 Page 252 at offset 0x0004e885 with length 26 and row count 27, min 4, max 4 Page 253 at offset 0x0004e89f with length 26 and row count 21, min 4, max 4 Page 254 at offset 0x0004e8b9 with length 26 and row count 21, min 4, max 4 Page 255 at offset 0x0004e8d3 with length 26 and row count 21, min 4, max 4 Page 256 at offset 0x0004e8ed with length 28 and row count 27, min 4, max 5 Page 257 at offset 0x0004e909 with length 26 and row count 21, min 5, max 5 Page 258 at offset 0x0004e923 with length 26 and row count 21, min 5, max 5 Page 259 at offset 0x0004e93d with length 26 and row count 21, min 5, max 5 Page 260 at offset 0x0004e957 with length 26 and row count 27, min 5, max 5 Page 261 at offset 0x0004e971 with length 26 and row count 21, min 5, max 5 Page 262 at offset 0x0004e98b with length 26 and row count 21, min 5, max 5 Page 263 at offset 0x0004e9a5 with length 26 and row count 21, min 5, max 5 Page 264 at offset 0x0004e9bf with length 26 and row count 27, min 5, max 5 Page 265 at offset 0x0004e9d9 with length 26 and row count 21, min 5, max 5 Page 266 at offset 0x0004e9f3 with length 26 and row count 21, min 5, max 5 Page 267 at offset 0x0004ea0d with length 26 and row count 21, min 5, max 5 Page 268 at offset 0x0004ea27 with length 26 and row count 27, min 5, max 5 Page 269 at offset 0x0004ea41 with length 26 and row count 21, min 5, max 5 Page 270 at offset 0x0004ea5b with length 28 and row count 21, min 5, max 6 Page 271 at offset 0x0004ea77 with length 26 and row count 21, min 6, max 6 Page 272 at offset 0x0004ea91 with length 26 and row count 27, min 6, max 6 Page 273 at offset 0x0004eaab with length 26 and row count 21, min 6, max 6 Page 274 at offset 0x0004eac5 with length 26 and row count 21, min 6, max 6 Page 275 at offset 0x0004eadf with length 26 and row count 21, min 6, max 6 Page 276 at offset 0x0004eaf9 with length 26 and row count 27, min 6, max 6 Page 277 at offset 0x0004eb13 with length 26 and row count 21, min 6, max 6 Page 278 at offset 0x0004eb2d with length 26 and row count 21, min 6, max 6 Page 279 at offset 0x0004eb47 with length 26 and row count 21, min 6, max 6 Page 280 at offset 0x0004eb61 with length 26 and row count 27, min 6, max 6 Page 281 at offset 0x0004eb7b with length 26 and row count 21, min 6, max 6 Page 282 at offset 0x0004eb95 with length 26 and row count 21, min 6, max 6 Page 283 at offset 0x0004ebaf with length 31 and row count 21, min 6, max 7 Page 284 at offset 0x0004ebce with length 26 and row count 27, min 7, max 7 Page 285 at offset 0x0004ebe8 with length 26 and row count 21, min 7, max 7 Page 286 at offset 0x0004ec02 with length 26 and row count 21, min 7, max 7 Page 287 at offset 0x0004ec1c with length 26 and row count 21, min 7, max 7 Page 288 at offset 0x0004ec36 with length 26 and row count 27, min 7, max 7 Page 289 at offset 0x0004ec50 with length 26 and row count 21, min 7, max 7 Page 290 at offset 0x0004ec6a with length 26 and row count 21, min 7, max 7 Page 291 at offset 0x0004ec84 with length 26 and row count 21, min 7, max 7 Page 292 at offset 0x0004ec9e with length 26 and row count 27, min 7, max 7 Page 293 at offset 0x0004ecb8 with length 26 and row count 21, min 7, max 7 Page 294 at offset 0x0004ecd2 with length 26 and row count 21, min 7, max 7 Page 295 at offset 0x0004ecec with length 26 and row count 21, min 7, max 7 Page 296 at offset 0x0004ed06 with length 26 and row count 27, min 7, max 7 Page 297 at offset 0x0004ed20 with length 28 and row count 21, min 7, max 8 Page 298 at offset 0x0004ed3c with length 26 and row count 21, min 8, max 8 Page 299 at offset 0x0004ed56 with length 26 and row count 21, min 8, max 8 Page 300 at offset 0x0004ed70 with length 26 and row count 27, min 8, max 8 Page 301 at offset 0x0004ed8a with length 26 and row count 21, min 8, max 8 Page 302 at offset 0x0004eda4 with length 26 and row count 21, min 8, max 8 Page 303 at offset 0x0004edbe with length 26 and row count 21, min 8, max 8 Page 304 at offset 0x0004edd8 with length 26 and row count 27, min 8, max 8 Page 305 at offset 0x0004edf2 with length 26 and row count 21, min 8, max 8 Page 306 at offset 0x0004ee0c with length 26 and row count 21, min 8, max 8 Page 307 at offset 0x0004ee26 with length 26 and row count 21, min 8, max 8 Page 308 at offset 0x0004ee40 with length 26 and row count 27, min 8, max 8 Page 309 at offset 0x0004ee5a with length 26 and row count 21, min 8, max 8 Page 310 at offset 0x0004ee74 with length 26 and row count 21, min 8, max 8 Page 311 at offset 0x0004ee8e with length 31 and row count 21, min 8, max 9 Page 312 at offset 0x0004eead with length 26 and row count 27, min 9, max 9 Page 313 at offset 0x0004eec7 with length 26 and row count 21, min 9, max 9 Page 314 at offset 0x0004eee1 with length 26 and row count 21, min 9, max 9 Page 315 at offset 0x0004eefb with length 26 and row count 21, min 9, max 9 Page 316 at offset 0x0004ef15 with length 26 and row count 27, min 9, max 9 Page 317 at offset 0x0004ef2f with length 26 and row count 21, min 9, max 9 Page 318 at offset 0x0004ef49 with length 26 and row count 21, min 9, max 9 Page 319 at offset 0x0004ef63 with length 26 and row count 21, min 9, max 9 Page 320 at offset 0x0004ef7d with length 26 and row count 27, min 9, max 9 Page 321 at offset 0x0004ef97 with length 26 and row count 21, min 9, max 9 Page 322 at offset 0x0004efb1 with length 26 and row count 21, min 9, max 9 Page 323 at offset 0x0004efcb with length 26 and row count 21, min 9, max 9 Page 324 at offset 0x0004efe5 with length 26 and row count 7300, min 9, max 9

Process finished with exit code 0

@alamb
Copy link
Contributor

alamb commented Dec 29, 2022

Adds a parquet-index binary which can be used to dump the page index for a given column of a given file. Longer-term I wonder if we should merge some of these tools together into a single binary, I may write up a ticket.

I suggest looking at https://github.com/manojkarthick/pqrs from @manojkarthick -- it would be great to contribute to that upstream

@tustvold
Copy link
Contributor Author

tustvold commented Dec 29, 2022

Yeah I've seen that, there will be some overlap, but I think there is value in having an officially supported set of tools, if nothing else it ensures we can keep them up to date as new functionality is added 😄

@alamb
Copy link
Contributor

alamb commented Dec 31, 2022

officially supported set of tools, if nothing else it ensures we can keep them up to date as new functionality is added

Maybe we could combine some of the tools together (taking a friendly look at pqrs) 🤔

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running this like:

cargo run --bin parquet-index --features=cli -- /Users/alamb/.influxdb_iox/1/4/1/8/00df2e44-8713-4d51-8276-aee5698a034a.parquet

And it errored:

    Finished dev [unoptimized + debuginfo] target(s) in 0.19s
     Running `/Users/alamb/Software/target-df/debug/parquet-index /Users/alamb/.influxdb_iox/1/4/1/8/00df2e44-8713-4d51-8276-aee5698a034a.parquet time`
Row Group: 0
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 9', parquet/src/bin/parquet-index.rs:90:33
stack backtrace:
   0: rust_begin_unwind
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
   2: core::panicking::panic_bounds_check
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:151:5
   3: <usize as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/slice/index.rs:259:10
   4: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/slice/index.rs:18:9
   5: <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/vec/mod.rs:2736:9
   6: parquet_index::Args::run
             at ./parquet/src/bin/parquet-index.rs:90:33
   7: parquet_index::main
             at ./parquet/src/bin/parquet-index.rs:167:5
   8: core::ops::function::FnOnce::call_once
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

The file seems to work fine in datafusion

alamb@MacBook-Pro-8 arrow-rs % datafusion-cli 
datafusion-cli 
DataFusion CLI v15.0.0
❯ create external table foo stored as parquet  LOCATION '/Users/alamb/.influxdb_iox/1/4/1/8/00df2e44-8713-4d51-8276-aee5698a034a.parquet';
create external table foo stored as parquet  LOCATION '/Users/alamb/.influxdb_iox/1/4/1/8/00df2e44-8713-4d51-8276-aee5698a034a.parquet';
0 rows in set. Query took 0.024 seconds.
❯ select * from foo limit 10;
select * from foo limit 10;
+---------------------+----------+------------------+--------------+---------------+-------+--------------+-----------+----------+---------------------+------------------+---------------+------------+----------+
| host                | io_time  | iops_in_progress | merged_reads | merged_writes | name  | read_bytes   | read_time | reads    | time                | weighted_io_time | write_bytes   | write_time | writes   |
+---------------------+----------+------------------+--------------+---------------+-------+--------------+-----------+----------+---------------------+------------------+---------------+------------+----------+
| MacBook-Pro-8.local | 24060965 | 0                | 0            | 0             | disk0 | 680351985664 | 14227208  | 18340412 | 2022-07-18T21:05:10 | 0                | 1340193685504 | 9833757    | 31951355 |
| MacBook-Pro-8.local | 24061010 | 0                | 0            | 0             | disk0 | 680353878016 | 14227224  | 18340462 | 2022-07-18T21:05:20 | 0                | 1340195880960 | 9833785    | 31951760 |
+---------------------+----------+------------------+--------------+---------------+-------+--------------+-----------+----------+---------------------+------------------+---------------+------------+----------+
2 rows in set. Query took 0.054 seconds.
❯ 

Here is the file
00df2e44-8713-4d51-8276-aee5698a034a.parquet.zip

@tustvold
Copy link
Contributor Author

tustvold commented Dec 31, 2022

Aah, that'll be because that file doesn't have any index information for that column (or any for that matter). I've improved the error message, although the semantics of how page index information is read could definitely be improved, it's a bit of a mess

@tustvold tustvold requested a review from alamb January 2, 2023 11:30
@alamb
Copy link
Contributor

alamb commented Jan 2, 2023

    Finished dev [unoptimized + debuginfo] target(s) in 9.28s
     Running `/Users/alamb/Software/target-df/debug/parquet-index /Users/alamb/.influxdb_iox/1/4/1/8/00df2e44-8713-4d51-8276-aee5698a034a.parquet time`
Row Group: 0
Error: General("No offset index for row group 0 column chunk 9")

Well that is certainly better than an error. It might be nicer to say "row group has no page index" and keep going

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good thing to me

@tustvold tustvold merged commit 6139d89 into apache:master Jan 2, 2023
@ursabot
Copy link

ursabot commented Jan 2, 2023

Benchmark runs are scheduled for baseline = 1889e33 and contender = 6139d89. 6139d89 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants