-
Notifications
You must be signed in to change notification settings - Fork 1.1k
/
Copy pathCHANGES.txt
20709 lines (14762 loc) · 905 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Lucene Change Log
For more information on past and future Lucene versions, please see:
http://s.apache.org/luceneversions
======================= Lucene 11.0.0 =======================
API Changes
---------------------
* GITHUB#11023: Removing deprecated parameters from CheckIndex. (Jakub Slowinski)
New Features
---------------------
* GITHUB#14097: Binary partitioning merge policy over float-valued vector field. (Mike Sokolov)
Improvements
---------------------
* GITHUB#266: TieredMergePolicy's maxMergeAtOnce default value was changed from 10 to 30. (Adrien Grand)
Optimizations
---------------------
* GITHUB#14011: Reduce allocation rate in HNSW concurrent merge. (Viliam Durina)
* GITHUB#14022: Optimize DFS marking of connected components in HNSW by reducing stack depth, improving performance and reducing allocations. (Viswanath Kuchibhotla)
Bug Fixes
---------------------
* GITHUB#14049: Randomize KNN codec params in RandomCodec. Fixes scalar quantization div-by-zero
when all values are identical. (Mike Sokolov)
* GITHUB#14075: Remove duplicate and add missing entry on brazilian portuguese stopwords list. (Arthur Caccavo)
Other
---------------------
(No changes)
======================= Lucene 10.2.0 =======================
API Changes
---------------------
* GITHUB#14069: Added DocIdSetIterator#intoBitSet API to let implementations
optimize loading doc IDs into a bit set. (Adrien Grand)
* GITHUB#14134: Added Bits#applyMask API to help apply live docs as a mask on a
bit set of matches. (Adrien Grand)
New Features
---------------------
* GITHUB#14084, GITHUB#13635, GITHUB#13634: Adds new `SeededKnnByteVectorQuery` and `SeededKnnFloatVectorQuery`
queries. These queries allow for the vector search entry points to be initialized via a `seed` query. This follows
the research provided via https://arxiv.org/abs/2307.16779. (Sean MacAvaney, Ben Trent).
Improvements
---------------------
* GITHUB#14079: Hunspell Dictionary now supports an option to tolerate REP rule count mismatches.
(Robert Muir)
* GITHUB#13984: Add HNSW graph checks and stats to CheckIndex
* GITHUB#14113: Remove unnecessary ByteArrayDataInput allocations from `Lucene90DocValuesProducer$TermsDict.decompressBlock`. (Ankit Jain)
* GITHUB#14138: Implement IntersectVisitor#visit(IntsRef) in many of the current implementations and add
BulkAdder#add(IntsRef) method. They should provide better performance due to less virtual method calls and
more efficient bulk processing. (Ignacio Vera)
Optimizations
---------------------
* GITHUB#14052: Speed up DisjunctionDISIApproximation#advance. (Adrien Grand)
* GITHUB#14080: Use the `DocIdSetIterator#loadIntoBitSet` API to speed up dense
conjunctions. (Adrien Grand)
* GITHUB#14133: Dense blocks of postings are now encoded as bit sets.
(Adrien Grand)
Bug Fixes
---------------------
* GITHUB#14109: prefetch may select the wrong memory segment for
multi-segment slices. (Chris Hegarty)
* GITHUB#14123: SortingCodecReader NPE when segment has no (points, vectors, etc...) (Mike Sokolov)
* GITHUB#14126: Avoid overflow in index input slices invariant checks
(Chris Hegarty)
Other
---------------------
* GITHUB#14081: Fix urls describing why NIOFS is not recommended for Windows (Marcel Yeonghyeon Ko)
* GITHUB#14116 Use CDL to block threads to avoid flaky tests. (Ao Li)
* GITHUB#14091: Cover all DataType. (Lu Xugang)
======================= Lucene 10.1.0 =======================
API Changes
---------------------
* GITHUB#13859: Allow open-ended ranges in Intervals range queries. (Mayya Sharipova)
* GITHUB#13950: Make BooleanQuery#getClauses public and add #add(Collection<BooleanClause>) to BQ builder. (Shubham Chaudhary)
* GITHUB#13957: Removed LeafSimScorer class, to save its overhead. Scorers now
compute scores directly from a SimScorer, postings and norms. (Adrien Grand)
* GITHUB#13998: Add IndexInput::isLoaded to determine if the contents of an
input is resident in physical memory. (Chris Hegarty)
New Features
---------------------
* GITHUB#14034: Add support for storing term vectors in FeatureField. (Jim Ferenczi)
Improvements
---------------------
* GITHUB#13986: Allow easier configuration of the Panama Vectorization provider with
newer Java versions. Set the `org.apache.lucene.vectorization.upperJavaFeatureVersion`
system property to increase the set of Java versions that Panama Vectorization will
provide optimized implementations for. (Chris Hegarty)
* GITHUB#266: TieredMergePolicy now allows merging up to maxMergeAtOnce
segments for merges below the floor segment size, even if maxMergeAtOnce is
bigger than segsPerTier. (Adrien Grand)
* GITHUB#14033: Combine all postings enum impls of the default codec into a
single class. (Adrien Grand)
Optimizations
---------------------
* GITHUB#13828: Reduce long[] array allocation for bitset in readBitSetIterator. (Zhang Chao)
* GITHUB#13800: MaxScoreBulkScorer now recomputes scorer partitions when the
minimum competitive allows for a more favorable partitioning. (Adrien Grand)
* GITHUB#13930: Use growNoCopy when copying bytes in BytesRefBuilder. (Ignacio Vera)
* GITHUB#13931: Refactored `BooleanScorer` to evaluate matches of sub clauses
using the `Scorer` abstraction rather than the `BulkScorer` abstraction. This
speeds up exhaustive evaluation of disjunctions of term queries.
(Adrien Grand)
* GITHUB#13941: Optimized computation of top-hits on disjunctive queries with
many clauses. (Adrien Grand)
* GITHUB#13954: Disabled exchanging scores across slices for exhaustive
top-hits evaluation. (Adrien Grand)
* GITHUB#13899: Check ahead if we can get the count. (Lu Xugang)
* GITHUB#13943: Removed shared `HitsThresholdChecker`, which reduces overhead
but may delay a bit when dynamic pruning kicks in. (Adrien Grand)
* GITHUB#13961: Replace Map<String,Object> with IntObjectHashMap for DV producer. (Pan Guixin)
* GITHUB#13963: Speed up nextDoc() implementations in Lucene912PostingsReader.
(Adrien Grand)
* GITHUB#13958: Speed up advancing within a block. (Adrien Grand)
* GITHUB#13763: Replace Map<String,Object> with IntObjectHashMap for KnnVectorsReader (Pan Guixin)
* GITHUB#13968: Switch postings from storing doc IDs in a long[] to an int[].
Lucene 8.4 had moved to a long[] to help speed up block decoding by using
longs that would pack two integers. We are now moving back to integers to be
able to take advantage of 2x more lanes with the vector API. (Adrien Grand)
* GITHUB#13994: Speed up top-k retrieval of filtered conjunctions.
(Adrien Grand)
* GITHUB#13985: Introduces IndexInput#updateReadAdvice to change the ReadAdvice
while merging vectors. (Tejas Shah)
* GITHUB#14000: Speed up top-k retrieval of filtered disjunctions.
(Adrien Grand)
* GITHUB#13999: CombinedFieldQuery now returns non-infinite maximum scores,
making it eligible to dynamic pruning. (Adrien Grand)
* GITHUB#13989: Faster checksum computation. (Jean-François Boeuf)
* GITHUB#14021: WANDScorer now computes scores on the fly, which helps prevent
advancing "tail" clauses in many cases. (Adrien Grand)
* GITHUB#14014: Filtered disjunctions now get executed via `MaxScoreBulkScorer`.
(Adrien Grand)
* GITHUB#14023: Make JVM inlining decisions more predictable in our main
queries. (Adrien Grand)
* GITHUB#14032: Speed up PostingsEnum when positions are requested.
(Adrien Grand)
* GITHUB#14040: Specialized top-level DisjunctionMaxQuery evaluation when the
tie break multiplier is 0. (Adrien Grand)
Bug Fixes
---------------------
* GITHUB#13832: Fixed an issue where the DefaultPassageFormatter.format method did not format passages as intended
when they were not sorted by startOffset. (Seunghan Jung)
* GITHUB#13884: Remove broken .toArray from Long/CharObjectHashMap entirely. (Pan Guixin)
* GITHUB#12686: Added support for highlighting IndexOrDocValuesQuery. (Prudhvi Godithi)
* GITHUB#13927: Fix StoredFieldsConsumer finish. (linfn)
* GITHUB#13944: Ensure deterministic order of clauses for `DisjunctionMaxQuery#toString`. (Laurent Jakubina)
* GITHUB#13841: Improve Tessellatorlogic when two holes share the same vertex with the polygon which was failing
in valid polygons. (Ignacio Vera)
* GITHUB#13990: Added filter to the toString() method of Knn[Float|Byte]VectorQuery
and DiversifyingChildren[Float|Byte]KnnVectorQuery. (Viswanath Kuchibhotla)
* GITHUB#13819: Prevent flattening of ordered and unordered interval sources (Jim Ferenczi)
* GITHUB#14008: Counts provided by taxonomy facets in addition to another aggregation are now returned together with
their corresponding ordinals. (Paul King)
* GITHUB#14027: Make SegmentInfos#readCommit(Directory, String, int) public (Luca Cavanna)
Build
---------------------
* Upgrade forbiddenapis to version 3.8. (Uwe Schindler)
Other
---------------------
* GITHUB#13982: Remove duplicate test code. (Lu Xugang)
======================= Lucene 10.0.0 =======================
API Changes
---------------------
* LUCENE-12092: Remove deprecated UTF8TaxonomyWriterCache. Please use LruTaxonomyWriterCache
instead. (Vigya Sharma)
* LUCENE-10010: AutomatonQuery, CompiledAutomaton, RunAutomaton, RegExp
classes no longer determinize NFAs. Instead it is the responsibility
of the caller to determinize. (Robert Muir)
* LUCENE-10368: IntTaxonomyFacets has been make pkg-private and serves only as an internal
implementation detail of taxonomy-faceting. (Greg Miller)
* LUCENE-10400: Remove deprecated dictionary constructors in Kuromoji and Nori (Tomoko Uchida)
* LUCENE-10440: TaxonomyFacets and FloatTaxonomyFacets have been made pkg-private and only serve
as internal implementation details of taxonomy-faceting. (Greg Miller)
* LUCENE-10431: MultiTermQuery.setRewriteMethod() has been removed. (Alan Woodward)
* LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and
KnnVectorFieldExistsQuery. (Zach Chen, Adrien Grand)
* LUCENE-10561: Reduce class/member visibility of all normalizer and stemmer classes. (Rushabh Shah)
* LUCENE-10266: Move nearest-neighbor search on points to core. (Rushabh Shah)
* LUCENE-10603: Remove SortedSetDocValues#NO_MORE_ORDS definition. (Greg Miller)
* GITHUB#11813: Remove Operations.isFinite: the recursive implementation could be problematic
for large automatons (WildcardQuery, PrefixQuery, RegExpQuery, etc). (taroplus, Robert Muir)
* GITHUB#11840: Query rewrite now takes an IndexSearcher instead of IndexReader to enable concurrent
rewriting. (Patrick Zhai)
* GITHUB#11933: Remove IOContext from Directory#openChecksumInput. (Zach Chen)
* GITHUB#11814: Support deletions in IndexRearranger. (Stefan Vodita)
* GITHUB#12107: Remove deprecated KnnVectorField, KnnVectorQuery, VectorValues and
LeafReader#getVectorValues. (Luca Cavanna)
* GITHUB#12296: Make IndexReader and IndexReaderContext classes explicitly sealed.
They have already been runtime-checked to only be implemented by the specific classes
so this is effectively a non-breaking change. (Petr Portnov)
* GITHUB#12276: Rename DaciukMihovAutomatonBuilder to StringsToAutomaton. (Michael McCandless)
* GITHUB#12321: Reduced visibility of StringsToAutomaton. Please use Automata#makeStringUnion instead. (Greg Miller)
* GITHUB#12407: Removed Scorable#docID. (Adrien Grand)
* GITHUB#12580: Remove deprecated IndexSearcher#getExecutor in favour of executing concurrent tasks using
the TaskExecutor that the searcher holds, retrieved via IndexSearcher#getTaskExecutor (Luca Cavanna)
* GITHUB#12599: Add RandomAccessInput#readBytes method to the RandomAccessInput interface. (Ignacio Vera)
* GITHUB#11023: Adding -level param to CheckIndex, making the old -fast param the default behaviour. (Jakub Slowinski)
* GITHUB#12873: Expressions module now uses MethodHandles to define custom functions. Support for
custom classloaders was removed. (Uwe Schindler)
* GITHUB#12243: Remove TermInSetQuery ctors taking varargs param. SortedSetDocValuesField#newSlowSetQuery,
SortedDocValuesField#newSlowSetQuery, KeywordField#newSetQuery, KeywordField#newSetQuery now take a collection. (Jakub Slowinski)
* GITHUB#12881: Performance improvements to MatchHighlighter and MatchRegionRetriever. MatchRegionRetriever can be
configured to not load matches (or content) of certain fields and to force-load other fields so that stored fields
of a document are accessed once. A configurable limit of field matches placed in the priority queue was added
(allows handling long fields with lots of hits more gracefully). MatchRegionRetriever utilizes IndexSearcher's
executor to extract hit offsets concurrently. (Dawid Weiss)
* GITHUB#12855: Remove deprecated DrillSideways#createDrillDownFacetsCollector extension method. (Greg Miller)
* GITHUB#12875: Ensure token position is always increased in PathHierarchyTokenizer and ReversePathHierarchyTokenizer
and resulting tokens do not overlap. (Michael Froh, Lukáš Vlček)
* GITHUB#13146, GITHUB#13148: Remove ByteBufferIndexInput and only use MemorySegment APIs
for MMapDirectory. (Uwe Schindler)
* GITHUB#13205: Convert IOContext, MergeInfo, and FlushInfo to record classes. (Uwe Schindler)
* GITHUB#13219: The `readOnce`, `load` and `random` flags on `IOContext` have
been replaced with a new `ReadAdvice` enum. (Adrien Grand)
* GITHUB#13242: Replace `IOContext.READ` with `IOContext.DEFAULT`. (Adrien Grand)
* GITHUB#13261: Convert `BooleanClause` class to record class. (Pulkit Gupta)
* GITHUB#13241: Remove Accountable interface on KnnVectorsReader. (Pulkit Gupta)
* GITHUB#13262: Removed deprecated constructors from DoubleField, FloatField, IntField, LongField, and LongPoint.
Additionally, deprecated methods have been removed from ByteBuffersIndexInput, BooleanQuery and others. Please refer
to MIGRATE.md for further details. (Sanjay Dutt)
* GITHUB#13337: Introduce new `IndexInput#prefetch(long)` API to give a hint to
the directory about bytes that are about to be read. (Adrien Grand, Uwe
Schindler)
* GITHUB#13408: Moved Weight#bulkScorer() to ScorerSupplier#bulkScorer() to better help parallelize
I/O for top-level disjunctions. Weight#bulkScorer() still exists for compatibility, but delegates
to ScorerSupplier#bulkScorer(). (Adrien Grand)
* GITHUB#13410: Removed Scorer#getWeight (Sanjay Dutt, Adrien Grand)
* GITHUB#13499: Remove deprecated TopScoreDocCollector + TopFieldCollector methods (#create, #createSharedManager) (Jakub Slowinski)
* GITHUB#13632: CandidateMatcher public matching functions (Bryan Jacobowitz)
* GITHUB#13708: Move Operations.sameLanguage/subsetOf to test-framework. (Robert Muir)
* GITHUB#13733: Move FacetsCollector#search utility methods to `FacetsCollectorManager`, replace the `Collector`
argument with a `FacetsCollectorManager` and update the return type to include both `TopDocs` results as well as
facets results. (Luca Cavanna)
* GITHUB#13328: Convert many basic Lucene classes to record classes, including CollectionStatistics, TermStatistics and LeafMetadata. (Shubham Chaudhary)
* GITHUB#13780: Remove IndexSearcher#search(List<LeafReaderContext>, Weight, Collector) in favour of the newly
introduced IndexSearcher#search(LeafReaderContextPartition[], Weight, Collector). (Luca Cavanna)
* GITHUB#13779: First-class random access API for KnnVectorValues
unifies Byte/FloatVectorValues incorporating RandomAccess* API and introduces
DocIndexIterator for iterative access in place of direct inheritance from DISI. (Michael Sokolov)
* GITHUB#13845: Add missing with-discountOverlaps Similarity constructor variants. (Pierre Salagnac, Christine Poerschke, Robert Muir)
* GITHUB#13820, GITHUB#13825, GITHUB#13830: Corrects DataInput.readGroupVInts to be public and not-final, removes the protected
DataInput.readGroupVInt method. (Zhang Chao, Robert Muir, Uwe Schindler, Dawid Weiss)
New Features
---------------------
* LUCENE-10010 Introduce NFARunAutomaton to run NFA directly. (Patrick Zhai)
* GITHUB-12767: Add a flag to enable executing using NFA in RegexpQuery. (Patrick Zhai)
* LUCENE-10626 Hunspell: add tools to aid dictionary editing:
analysis introspection, stem expansion and stem/flag suggestion (Peter Gromov)
* GITHUB#12829: For indices newly created as of 10.0.0 onwards, IndexWriter preserves document blocks indexed via
IndexWriter#addDocuments or IndexWriter#updateDocuments also when index sorting is configured. Document blocks are
maintained alongside their parent documents during sort and merge. IndexWriterConfig now requires a parent field to be
specified if index sorting is used together with document blocks. (Simon Willnauer)
* GITHUB#13233: Add RomanianNormalizationFilter (Trey Jones, Robert Muir)
* GITHUB#13449: Sparse index: optional skip list on top of doc values which is exposed via the
DocValuesSkipper abstraction. A new flag is added to FieldType.java that configures whether
to create a "skip index" for doc values. (Ignacio Vera)
* GITHUB#13563: Add levels to doc values skip index. (Ignacio Vera)
* GITHUB#13597: Align doc value skipper interval boundaries when an interval contains a constant
value. (Ignacio Vera)
* GITHUB#13604: Add Kmeans clustering on vectors (Mayya Sharipova, Jim Ferenczi, Tom Veasey)
* GITHUB#13592: Take advantage of the doc value skipper when it is primary sort in SortedNumericDocValuesRangeQuery
and SortedSetDocValuesRangeQuery. (Ignacio Vera)
* GITHUB#13542: Add initial support for intra-segment concurrency. IndexSearcher now supports searching across leaf
reader partitions concurrently. This is useful to max out available resource usage especially with force merged
indices or big segments. There is still a performance penalty for queries that require segment-level computation
ahead of time, such as points/range queries. This is an implementation limitation that we expect to improve in
future releases, ad that's why intra-segment slicing is not enabled by default, but leveraged in tests when the
searcher is created via LuceneTestCase#newSearcher. Users may override IndexSearcher#slices(List) to optionally
create slices that target segment partitions. (Luca Cavanna)
* GITHUB#13741: Implement Accountable for NFARunAutomaton, fix hashCode implementation of CompiledAutomaton. (Patrick Zhai)
Improvements
---------------------
* GITHUB#13246: Simplify bytes comparison as long comparison in NumericComparator. (Guo feng)
* LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori.
(Uihyun Kim)
* LUCENE-10614: Properly support getTopChildren in RangeFacetCounts. (Yuting Gan)
* LUCENE-10652: Add a top-n range faceting example to RangeFacetsExample. (Yuting Gan)
* GITHUB#12447: Hunspell: speed up the dictionary enumeration on suggestion (Peter Gromov)
* GITHUB#12873: Expressions module now uses JEP 371 "Hidden Classes" with JEP 309
"Dynamic Class-File Constants" to implement Javascript expressions. (Uwe Schindler)
* GITHUB#11657, LUCENE-10621: Upgrade to OpenNLP 2.3.2. (Christine Poerschke, Eric Pugh)
* GITHUB#13209: Upgrade snowball to 26db1ab9. (Robert Muir)
* GITHUB#12172: Update Romanian stopwords list to include the modern unicode forms. (Trey Jones)
* GITHUB#13707: Improve Operations.isTotal() to work with non-minimal automata. (Dawid Weiss, Robert Muir)
Optimizations
---------------------
* GITHUB#11857, GITHUB#11859, GITHUB#11893, GITHUB#11909: Hunspell: improved suggestion performance (Peter Gromov)
* GITHUB#12825, GITHUB#12834: Hunspell: improved dictionary loading performance, allowed in-memory entry sorting.
(Peter Gromov)
* GITHUB#12372: Reduce allocation during HNSW construction (Jonathan Ellis)
* GITHUB#12552: Make FSTPostingsFormat load FSTs off-heap. (Tony X)
* GITHUB#13672: Leverage doc value skip lists in DocValuesRewriteMethod if indexed. (Greg Miller)
Bug Fixes
---------------------
* LUCENE-10599: LogMergePolicy is more likely to keep merging segments until
they reach the maximum merge size. (Adrien Grand)
* GITHUB#12220: Hunspell: disallow hidden title-case entries from compound middle/end. (Peter Gromov)
* GITHUB#12878: Fix the declared Exceptions of Expression#evaluate() to match those
of DoubleValues#doubleValue(). (Uwe Schindler)
* GITHUB#13498: Avoid performance regression by constructing lazily the PointTree in NumericComparator, (Ignacio Vera)
Changes in Runtime Behavior
---------------------
* GITHUB#13244, GITHUB#13264: IOContext now uses ReadAdvice#RANDOM by default for read
operations. An implication is that `MMapDirectory` will use POSIX_MADV_RANDOM
on POSIX systems. To fallback to OS default behaviour, pass system property via
`-Dorg.apache.lucene.store.defaultReadAdvice=normal`. This may be useful on systems
with lots of RAM as this increases read-ahead. (Adrien Grand, Uwe Schindler)
* GITHUB13293: Auto I/O throttling is now disabled by default on ConcurrentMergeScheduler.
(Adrien Grand)
* GITHUB#13293: ConcurrentMergeScheduler now allows up to 50% of the threads of the host to be used
for merging. (Adrien Grand)
* GITHUB#13277: IndexWriter treats any java.lang.Error as tragic. (Robert Muir)
Changes in Backwards Compatibility Policy
-----------------------------------------
* GITHUB#12829: IndexWriter#addDocuments or IndexWriter#updateDocuments now require a parent field name to be
specified in IndexWriterConfig is documents blocks are indexed and index time sorting is configured. (Simon Willnauer)
* GITHUB#13230: Remove the Kp and Lovins snowball algorithms which are not supported
or intended for general use. (Robert Muir)
* GITHUB#13602: SearchWithCollectorTask no longer supports the `collector.class` config parameter to load a custom
collector implementation. `collector.manager.class` allows users to load a collector manager instead. (Luca Cavanna)
Other
---------------------
* GITHUB#13459: Merges all immutable attributes in FieldInfos.FieldNumbers into one Hashmap saving
memory when writing big indices. Fixes an exotic bug when calling clear where not all attributes
were cleared. (Ignacio Vera)
* LUCENE-10376: Roll up the loop in VInt/VLong in DataInput. (Guo Feng)
* LUCENE-10253: The @BadApple annotation has been removed from the test
framework. (Adrien Grand)
* LUCENE-10393: Unify binary dictionary and dictionary writer in Kuromoji and Nori.
(Tomoko Uchida, Robert Muir)
* LUCENE-10475: Merge dictionary builders in `util` package into `dict` package in Kuromoji and Nori.
All classes in `org.apache.lucene.analysis.[ja|ko].util` was moved to `org.apache.lucene.analysis.[ja|ko].dict`.
(Tomoko Uchida)
* LUCENE-10493: Factor out Viterbi algorithm in Kuromoji and Nori to analysis-common. (Tomoko Uchida)
* GITHUB#977, LUCENE-9500: Remove the deflater hack introduced because of JDK-8252739 (Uwe Schindler)
* GITHUB#11960: Hunspell: supported empty dictionaries (Peter Gromov)
* GITHUB#12239: Hunspell: reduced suggestion set dependency on the hash table order (Peter Gromov)
* GITHUB#9049: Fixing bug in UnescapedCharSequence#toStringEscaped() (Jakub Slowinski)
* GITHUB#13001: Put Thread#sleep() on the list of forbidden APIs. (Shubham Chaudhary)
* GITHUB#12753: Bump minimum required Java version to 21
(Chris Hegarty, Robert Muir, Uwe Schindler)
* GITHUB#13296: Convert the FieldEntry, a static nested class, into a record. (Sanjay Dutt)
* GITHUB#13332: Improve MissingDoclet linter to check records correctly. (Uwe Schindler)
* GITHUB#13499: Remove usage of TopScoreDocCollector + TopFieldCollector deprecated methods (#create, #createSharedManager) (Jakub Slowinski)
Build
---------------------
* GITHUB#13649: Fix eclipse ide settings generation #13649 (Uwe Schindler, Dawid Weiss)
* GITHUB#13698: Upgrade to gradle 8.10 (Dawid Weiss)
======================== Lucene 9.12.0 =======================
Security Fixes
---------------------
* Deserialization of Untrusted Data vulnerability in Apache Lucene Replicator - CVE-2024-45772
(Summ3r from Vidar-Team, Robert Muir, Paul Irwin)
API Changes
---------------------
* GITHUB#13806: Add TermInSetQuery#getBytesRefIterator to be able to iterate over query terms. (Christoph Büscher)
* GITHUB#13469: Expose FlatVectorsFormat as a first-class format; can be configured using a custom Codec. (Michael Sokolov)
* GITHUB#13612: Hunspell: add Suggester#proceedPastRep to avoid losing relevant suggestions. (Peter Gromov)
* GITHUB#13603: Introduced `IndexSearcher#searchLeaf(LeafReaderContext, Weight, Collector)` protected method to
facilitate customizing per-leaf behavior of search without requiring to override
`search(LeafReaderContext[], Weight, Collector)` which requires overriding the entire loop across the leaves (Luca Cavanna)
* GITHUB#13559: Add BitSet#nextSetBit(int, int) to get the index of the first set bit in range. (Egor Potemkin)
* GITHUB#13568: Add DoubleValuesSource#toSortableLongDoubleValuesSource and
MultiDoubleValuesSource#toSortableMultiLongValuesSource methods. (Shradha Shankar)
* GITHUB#13568, GITHUB#13750: Add DrillSideways#search method that supports any CollectorManagers for drill-sideways dimensions
or drill-down. (Egor Potemkin)
* GITHUB#13757: For similarities, provide default computeNorm implementation and remove remaining discountOverlaps setters.
(Christine Poerschke, Adrien Grand, Robert Muir)
New Features
---------------------
* GITHUB#13430: Allow configuring the search concurrency via
TieredMergePolicy#setTargetSearchConcurrency. This in-turn instructs the
merge policy to try to have at least this number of segments on the highest
tier. (Adrien Grand, Carlos Delgado)
* GITHUB#13517: Allow configuring the search concurrency on LogDocMergePolicy
and LogByteSizeMergePolicy via a new #setTargetConcurrency setter.
(Adrien Grand)
* GITHUB#13568: Add sandbox facets module to compute facets while collecting. (Egor Potemkin, Shradha Shankar)
* GITHUB#13678: Add support JDK 23 to the Panama Vectorization Provider. (Chris Hegarty)
* GITHUB#13689: Add a new faceting feature, dynamic range facets, which automatically picks a balanced set of numeric
ranges based on the distribution of values that occur across all hits. For use cases that have a highly variable
numeric doc values field, such as "price" in an e-commerce application, this facet method is powerful as it allows the
presented ranges to adapt depending on what hits the query actually matches. This is in contrast to existing range
faceting that requires the application to provide the specific fixed ranges up front. (Yuting Gan, Greg Miller,
Stefan Vodita)
Improvements
---------------------
* GITHUB#13475: Re-enable intra-merge parallelism except for terms, norms, and doc values.
Related to GITHUB#13478. (Ben Trent)
* GITHUB#13548: Refactor and javadoc update for KNN vector writer classes. (Patrick Zhai)
* GITHUB#13562: Add Intervals.regexp and Intervals.range methods to produce IntervalsSource
for regexp and range queries. (Mayya Sharipova)
* GITHUB#13625: Remove BitSet#nextSetBit code duplication. (Greg Miller)
* GITHUB#13285: Early terminate graph searches of AbstractVectorSimilarityQuery to follow timeout set from
IndexSearcher#setTimeout(QueryTimeout). (Kaival Parikh)
* GITHUB#13633: Add ability to read/write knn vector values to a MemoryIndex. (Ben Trent)
* GITHUB#12627: patch HNSW graphs to improve reachability of all nodes from entry points
* GITHUB#13201: Better cost estimation on MultiTermQuery over few terms. (Michael Froh)
* GITHUB#13735: Migrate monitor package usage of deprecated IndexSearcher#search(Query, Collector)
to IndexSearcher#search(Query, CollectorManager). (Greg Miller)
* GITHUB#13746: Introduce ProfilerCollectorManager to parallelize search when using ProfilerCollector. (Luca Cavanna)
Optimizations
---------------------
* GITHUB#13439: Avoid unnecessary memory allocation in PackedLongValues#Iterator. (Zhang Chao)
* GITHUB##13425: Rewrite SortedNumericDocValuesRangeQuery to MatchNoDocsQuery when the upper bound is smaller than the
lower bound. (Ioana Tagirta)
* GITHUB#13322: Implement Weight#count for vector values in the FieldExistsQuery. (Pan Guixin)
* GITHUB#13454: MultiTermQuery returns null ScoreSupplier in cases where
no query terms are present in the index segment (Mayya Sharipova)
* GITHUB#13431: Replace TreeMap and use compiled Patterns in Japanese UserDictionary. (Bruno Roustant)
* GITHUB#12941: Don't preserve auxiliary buffer contents in LSBRadixSorter if it grows. (Stefan Vodita)
* GITHUB#13175: Stop double-checking priority queue inserts in some FacetCount classes. (Jakub Slowinski)
* GITHUB#13538: Slightly reduce heap usage for HNSW and scalar quantized vector writers. (Ben Trent)
* GITHUB#12100: WordBreakSpellChecker.suggestWordBreaks now does a breadth first search, allowing it to return
better matches with fewer evaluations (hossman)
* GITHUB#13582: Stop requiring MaxScoreBulkScorer's outer window from having at
least INNER_WINDOW_SIZE docs. (Adrien Grand)
* GITHUB#13570, GITHUB#13574, GITHUB#13535: Avoid performance degradation with closing shared Arenas.
Closing many individual index files can potentially lead to a degradation in execution performance.
Index files are mmapped one-to-one with the JDK's foreign shared Arena. The JVM deoptimizes the top
few frames of all threads when closing a shared Arena (see JDK-8335480). We mitigate this situation
when running with JDK 21 and greater, by 1) using a confined Arena where appropriate, and 2) grouping
files from the same segment to a single shared Arena.
A system property has been added that allows to control the total maximum number of mmapped files
that may be associated with a single shared Arena. For example, to set the max number of permits to
256, pass the following on the command line
-Dorg.apache.lucene.store.MMapDirectory.sharedArenaMaxPermits=256. Setting a value of 1 associates
a single file to a single shared arena.
(Chris Hegarty, Michael Gibney, Uwe Schindler)
* GITHUB#13585: Lucene912PostingsFormat, the new default postings format, now
only has 2 levels of skip data, which are inlined into postings instead of
being stored at the end of postings lists. This translates into better
performance for queries that need skipping such as conjunctions.
(Adrien Grand)
* GITHUB#13581: OnHeapHnswGraph no longer allocates a lock for every graph node (Mike Sokolov)
* GITHUB#13636, GITHUB#13658: Optimizations to the decoding logic of blocks of
postings. (Adrien Grand, Uwe Schindler, Greg Miller)
* GITHUB##13644: Improve NumericComparator competitive iterator logic by comparing the missing value with the top
value even after the hit queue is full (Pan Guixin)
* GITHUB#13587: Use Max WAND optimizations with ToParentBlockJoinQuery when using ScoreMode.Max (Mike Pellegrini)
* GITHUB#13742: Reorder checks in LRUQueryCache#count (Shubham Chaudhary)
* GITHUB#13697: Add a bulk scorer to ToParentBlockJoinQuery, which delegates to the bulk scorer of the child query.
This should speed up query evaluation when the child query has a specialized bulk scorer, such as disjunctive queries.
(Mike Pellegrini)
Changes in runtime behavior
---------------------
* GITHUB#13472: When an executor is provided to the IndexSearcher constructor, the searcher now executes tasks on the
thread that invoked a search as well as its configured executor. Users should reduce the executor's thread-count by 1
to retain the previous level of parallelism. Moreover, it is now possible to start searches from the same executor
that is configured in the IndexSearcher without risk of deadlocking. A separate executor for starting searches is no
longer required. (Armin Braun)
Bug Fixes
---------------------
* GITHUB#13384: Fix highlighter to use longer passages instead of shorter individual terms. (Zack Kendall)
* GITHUB#13463: Address bug in MultiLeafKnnCollector causing #minCompetitiveSimilarity to stay artificially low in
some corner cases. (Greg Miller)
* GITHUB#13553: Correct RamUsageEstimate for scalar quantized knn vector formats so that raw vectors are correctly
accounted for. (Ben Trent)
* GITHUB#13615: Correct scalar quantization when used in conjunction with COSINE similarity. Vectors are normalized
before quantization to ensure the cosine similarity is correctly calculated. (Ben Trent)
* GITHUB#13627: Fix race condition on flush for DWPT seqNo generation. (Ben Trent, Ao Li)
* GITHUB#13691: Fix incorrect exponent value in explain of SigmoidFunction. (Owais Kazi)
* GITHUB#13703: Fix bug in LatLonPoint queries where narrow polygons close to latitude 90 don't
match any points due to an Integer overflow. (Ignacio Vera)
* GITHUB#13641: Unify how KnnFormats handle missing fields and correctly handle missing vector fields when
merging segments. (Ben Trent)
* GITHUB#13519: 8 bit scalar vector quantization is no longer
supported: it was buggy starting in 9.11 (GITHUB#13197). 4 and 7
bit quantization are still supported. Existing (9.x) Lucene indices
that previously used 8 bit quantization can still be read/searched
but the results from `KNN*VectorQuery` are silently buggy. Further
8 bit quantized vector indexing into such (9.11) indices is not
permitted, so your path forward if you wish to continue using the
same 9.11 index is to index additional vectors into the same field
with either 4 or 7 bit quantization (or no quantization), and ensure
all older (9.11 written) segments are rewritten either via
`IndexWriter.forceMerge` or
`IndexWriter.addIndexes(CodecReader...)`, or reindexing entirely.
* GITHUB#13799: Disable intra-merge parallelism for all structures but kNN vectors. (Ben Trent)
Build
---------------------
* GITHUB#13695, GITHUB#13696: Fix Gradle build sometimes gives spurious "unreferenced license file" warnings.
(Uwe Schindler)
Other
--------------------
* GITHUB#13720: Add float comparison based on unit of least precision and use it to stop test failures caused by float
summation not being associative in IEEE 754. (Alex Herbert, Stefan Vodita)
* Remove code triggering forbidden-apis regarding Java serialization. (Uwe Schindler, Robert Muir)
======================== Lucene 9.11.1 =======================
Bug Fixes
---------------------
* GITHUB#13498: Avoid performance regression by constructing lazily the PointTree in NumericComparator. (Ignacio Vera)
* GITHUB#13501, GITHUB#13478: Remove intra-merge parallelism for everything except HNSW graph merges. (Ben Trent)
* GITHUB#13498, GITHUB#13340: Allow adding a parent field to an index with no fields (Michael Sokolov)
* GITHUB#12431: Fix IndexOutOfBoundsException thrown in DefaultPassageFormatter
by unordered matches. (Stephane Campinas)
* GITHUB#13493: StringValueFacetCounts stops throwing NPE when faceting over an empty match-set. (Grebennikov Roman,
Stefan Vodita)
======================== Lucene 9.11.0 =======================
API Changes
---------------------
* GITHUB#13145: Deprecate ByteBufferIndexInput as it will be removed in Lucene 10.0. (Uwe Schindler)
* GITHUB#13422: an explicit dependency on the HPPC library is removed in favor of an internal repackaged copy in
oal.internal.hppc. If you relied on HPPC as a transitive dependency, you'll have to add it to your project explicitly.
The HPPC classes now bundled in Lucene core are internal and will have restricted access in future releases, please do
not use them. (Bruno Roustant, Dawid Weiss, Uwe Schindler, Chris Hegarty)
New Features
---------------------
* GITHUB#13125: Recursive graph bisection is now supported on indexes that have blocks, as long as
they configure a parent field via `IndexWriterConfig#setParentField`. (Adrien Grand)
* GITHUB#12915: Add new token filters for Japanese sutegana (捨て仮名). This introduces JapaneseHiraganaUppercaseFilter
and JapaneseKatakanaUppercaseFilter. (Dai Sugimori)
* GITHUB#13196, GITHUB#13222: Add support for posix_madvise to MMapDirectory: If running on
Linux/macOS and Java 21 or later, MMapDirectory uses IOContext to pass suitable MADV flags to
kernel of operating system. In particular, merging now passes POSIX_MADV_SEQUENTIAL to the readers
that are being merged, and searching passes POSIX_MADV_RANDOM to vector data files - including
quantized vector data files, HNSW graphs, stored fields data files and term vectors data files.
This may improve paging logic especially when working with large indexes under memory pressure.
(Uwe Schindler, Chris Hegarty, Robert Muir, Adrien Grand)
* GITHUB#13197: Expand support for new scalar bit levels for HNSW vectors. This includes 4-bit vectors and an option
to compress them to gain a 50% reduction in memory usage. (Ben Trent)
* GITHUB#13268: Add ability for UnifiedHighlighter to highlight a field based on combined matches from multiple fields.
(Mayya Sharipova, Jim Ferenczi)
* GITHUB#13288: Make HNSW and Flat storage vector formats easier to extend with new FlatVectorScorer interface. Add
new Hnsw format for binary quantized vectors. (Ben Trent)
* GITHUB#13181: Add new VectorScorer interface to vector value iterators. This allows for vector codecs to supply
simpler and more optimized vector scoring when iterating vector values directly. (Ben Trent)
* GITHUB#13414: Counts are always available in the result when using taxonomy facets. (Stefan Vodita)
* GITHUB#13445: Add new option when calculating scalar quantiles. The new option of setting `confidenceInterval` to
`0` will now dynamically determine the quantiles through a grid search over multiple quantiles calculated
by multiple intervals. (Ben Trent)
Improvements
---------------------
* GITHUB#13092: `static final Map` constants have been made immutable (Dmitry Cherniachenko)
* GITHUB#13041: TokenizedPhraseQueryNode code cleanup (Dmitry Cherniachenko)
* GITHUB#13087: Changed `static final Set` constants to be immutable. Among others it affected
ScandinavianNormalizer.ALL_FOLDINGS set with public access. (Dmitry Cherniachenko)
* GITHUB#13155: Hunspell: allow ignoring exceptions on duplicate ICONV/OCONV mappings (Peter Gromov)
* GITHUB#13156: Hunspell: don't proceed with other suggestions if we found good REP ones (Peter Gromov)
* GITHUB#13066: Support getMaxScore of DisjunctionSumScorer for non top level scoring clause (Shintaro Murakami)
* GITHUB#13124: MergeScheduler can now provide an executor for intra-merge parallelism. The first
implementation is the ConcurrentMergeScheduler and the Lucene99HnswVectorsFormat will use it if no other
executor is provided. (Ben Trent)
* GITHUB#13239: Upgrade icu4j to version 74.2. (Robert Muir)
* GITHUB#13202: Early terminate graph and exact searches of AbstractKnnVectorQuery to follow timeout set from
IndexSearcher#setTimeout(QueryTimeout). (Kaival Parikh)
* GITHUB#12966: Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself.
This reduces code duplication and enables future development. (Stefan Vodita)
* GITHUB#13362: Add sub query explanations to DisjunctionMaxQuery, if the overall query didn't match. (Tim Grein)
* GITHUB#13385: Add Intervals.noIntervals() method to produce an empty IntervalsSource.
(Aniketh Jain, Uwe Schindler, Alan Woodward))
* GITHUB#13276: UnifiedHighlighter: new 'passageSortComparator' option to allow sorting other than offset order. (Seunghan Jung)
* GITHUB#13429: Hunspell: speed up "compress"; minimize the number of the generated entries; don't even consider "forbidden" entries anymore (Peter Gromov)
Optimizations
---------------------
* GITHUB#13306: Use RWLock to access LRUQueryCache to reduce contention. (Boice Huang)
* GITHUB#13252: Replace handwritten loops compare with Arrays.compareUnsigned in SegmentTermsEnum. (zhouhui)
* GITHUB#12996: Reduce ArrayUtil#grow in decompress. (Zhang Chao)
* GITHUB#13115: Short circuit queued flush check when flush on update is disabled (Prabhat Sharma)
* GITHUB#13085: Remove unnecessary toString() / substring() calls to save some String allocations (Dmitry Cherniachenko)
* GITHUB#13121: Speedup multi-segment HNSW graph search for diversifying child kNN queries. Builds on GITHUB#12962.
(Ben Trent)
* GITHUB#13184: Make the HitQueue size more appropriate for KNN exact search (Pan Guixin)
* GITHUB#13199: Speed up dynamic pruning by breaking point estimation when threshold get exceeded. (Guo Feng)
* GITHUB#13203: Speed up writeGroupVInts (Zhang Chao)
* GITHUB#13224: Use singleton for all-zeros DirectMonotonicReader.Meta (Armin Braun)
* GITHUB#13232 : Introduce singleton for PackedInts.NullReader of size 256 (Armin Braun)
* GITHUB#11888: Binary search the BlockTree terms dictionary entries when all suffixes have the same length
in a leaf block, speeding up cases like primary key lookup on an id field when all ids are the same length. (zhouhui)
* GITHUB#13149: Made PointRangeQuery faster, for some segment sizes, by reducing the amount of virtual calls to
IntersectVisitor::visit(int). (Anton Hägerstrand)
* GITHUB#12966: FloatTaxonomyFacets can now collect values into a sparse structure, like IntTaxonomyFacets already
could. (Stefan Vodita)
* GITHUB#13284: Per-field doc values and knn vectors readers now use a HashMap internally instead of
a TreeMap. (Adrien Grand)
* GITHUB#13321: Improve compressed int4 quantized vector search by utilizing SIMD inline with the decompression
process. (Ben Trent)
* GITHUB#12408: Lazy initialization improvements for Facets implementations when there are segments with no hits
to count. (Greg Miller)
* GITHUB#13327: Reduce memory usage of field maps in FieldInfos and BlockTree TermsReader. (Bruno Roustant, David Smiley)
* GITHUB#13339: Add a MemorySegment Vector scorer - for scoring without copying on-heap (Chris Hegarty)
* GITHUB#13368: Replace Map<Integer, Object> by primitive IntObjectHashMap. (Bruno Roustant)
* GITHUB#13392: Replace Map<Long, Object> by primitive LongObjectHashMap. (Bruno Roustant)
* GITHUB#13400: Replace Set<Integer> by IntHashSet and Set<Long> by LongHashSet. (Bruno Roustant)
* GITHUB#13406: Replace List<Integer> by IntArrayList and List<Long> by LongArrayList. (Bruno Roustant)
* GITHUB#13420: Replace Map<Character> by CharObjectHashMap and Set<Character> by CharHashSet. (Bruno Roustant)
Bug Fixes
---------------------
* GITHUB#13105: Fix ByteKnnVectorFieldSource & FloatKnnVectorFieldSource to work correctly when a segment does not contain
any docs with vectors (hossman)
* GITHUB#13017: Fix DV update files referenced by merge will be deleted by concurrent flush. (Jialiang Guo)
* GITHUB#13145: Detect MemorySegmentIndexInput correctly in NRTSuggester. (Uwe Schindler)
* GITHUB#13154: Hunspell GeneratingSuggester: ensure there are never more than 100 roots to process (Peter Gromov)
* GITHUB#13162: Fix NPE when LeafReader return null VectorValues (Pan Guixin)
* GITHUB#13169: Fix potential race condition in DocumentsWriter & DocumentsWriterDeleteQueue (Ben Trent)
* GITHUB#13204: Fix equals/hashCode of IOContext. (Uwe Schindler, Robert Muir)
* GITHUB#13206: Subtract deleted file size from the cache size of NRTCachingDirectory. (Jean-François Boeuf)
* GITHUB#12966: Aggregation facets no longer assume that aggregation values are positive. (Stefan Vodita)
* GITHUB#13356: Ensure negative scores are not returned from scalar quantization scorer. (Ben Trent)
* GITHUB#13366: Disallow NaN and Inf values in scalar quantization and better handle extreme cases. (Ben Trent)
* GITHUB#13369: Fix NRT opening failure when soft deletes are enabled and the document fails to index before a point
field is written (Ben Trent)
* GITHUB#13378: Fix points writing with no values (Chris Hegarty)
* GITHUB#13374: Fix bug in SQ when just a single vector present in a segment (Chris Hegarty)
* GITHUB#13376: Fix integer overflow exception in postings encoding as group-varint. (Zhang Chao, Guo Feng)
* GITHUB#13421: Fixes TestOrdinalMap.testRamBytesUsed for multiple default PackedInts.NullReader instances. (Amir Raza)
Build
---------------------
* Upgrade forbiddenapis to version 3.7 and ASM for APIJAR extraction to 9.7. (Uwe Schindler)
Other
---------------------
* GITHUB#13068: Replace numerous `brToString(BytesRef)` copies with a `ToStringUtils` method (Dmitry Cherniachenko)
* GITHUB#13077: Add public getter for SynonymQuery#field (Andrey Bozhko)
* GITHUB#13393: Add support for reloading the SPI for KnnVectorsFormat class (Navneet Verma)
======================== Lucene 9.10.0 =======================
API Changes
---------------------
* GITHUB#12243: Mark TermInSetQuery ctors with varargs terms as @Deprecated. SortedSetDocValuesField#newSlowSetQuery,
SortedDocValuesField#newSlowSetQuery, KeywordField#newSetQuery now take a collection of terms as a param. (Jakub Slowinski)
* GITHUB#11041: Deprecate IndexSearch#search(Query, Collector) in favor of
IndexSearcher#search(Query, CollectorManager) for TopFieldCollectorManager
and TopScoreDocCollectorManager. (Zach Chen, Adrien Grand, Michael McCandless, Greg Miller, Luca Cavanna)
* GITHUB#12854: Mark DrillSideways#createDrillDownFacetsCollector as @Deprecated. (Greg Miller)
* GITHUB#12624, GITHUB#12831: Allow FSTCompiler to stream to any DataOutput while building, and
make compile() only return the FSTMetadata. For on-heap (default) use case, please use
FST.fromFSTReader(fstMetadata, fstCompiler.getFSTReader()) to create the FST. (Anh Dung Bui)
New Features
---------------------
* GITHUB#12679: Add support for similarity-based vector searches using [Byte|Float]VectorSimilarityQuery. Uses a new
VectorSimilarityCollector to find all vectors scoring above a `resultSimilarity` while traversing the HNSW graph till
better-scoring nodes are available, or the best candidate is below a score of `traversalSimilarity` in the lowest
level. (Aditya Prakash, Kaival Parikh)
* GITHUB#12829: For indices newly created as of 9.10.0 onwards, IndexWriter preserves document blocks indexed via
IndexWriter#addDocuments or IndexWriter#updateDocuments also when index sorting is configured. Document blocks are
maintained alongside their parent documents during sort and merge. IndexWriterConfig accepts a parent field that is used
to maintain block orders if index sorting is used. Note, this is fully optional in Lucene 9.x while will be mandatory for
indices that use document blocks together with index sorting as of 10.0.0. (Simon Willnauer)
* GITHUB#12336: Index additional data per facet label in the taxonomy. (Shai Erera, Egor Potemkin, Mike McCandless,
Stefan Vodita)
* GITHUB#12706: Add support for the final release of Java foreign memory API in Java 22 (and later).
Lucene's MMapDirectory will now mmap Lucene indexes in chunks of 16 GiB (instead of 1 GiB) starting
from Java 19. Indexes closed while queries are running can no longer crash the JVM.
Support for vectorized implementations of VectorUtil based on jdk.incubator.vector APIs was added
for exactly Java 22. Therefore, applications started with command line parameter
"java --add-modules jdk.incubator.vector" will automatically use the new vectorized implementations
if running on a supported platform (Java 20/21/22 on x86 CPUs with AVX2 or later or ARM NEON CPUs).
This is an opt-in feature and requires explicit Java command line flag! When enabled, Lucene logs
a notice using java.util.logging. Please test thoroughly and report bugs/slowness to Lucene's mailing
list. (Uwe Schindler, Chris Hegarty)
Improvements
---------------------
* GITHUB#12870: Tighten synchronized loop in DirectoryTaxonomyReader#getOrdinal. (Stefan Vodita)
* GITHUB#12812: Avoid overflows and false negatives in int slice buffer filled-with-zeros assertion. (Stefan Vodita)
* GITHUB#12910: Refactor around NeighborArray to make it more self-contained. (Patrick Zhai)
* GITHUB#12999: Use Automaton for SurroundQuery prefix/pattern matching (Michael Gibney)
* GITHUB#13043: Support getMaxScore of ConjunctionScorer for non top level scoring clause. (Shintaro Murakami)
* GITHUB#13055: Make DEFAULT_STOP_TAGS in KoreanPartOfSpeechStopFilter immutable (Dmitry Cherniachenko)