-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGES.txt
1007 lines (784 loc) · 47 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2.6.0 - May 6, 2024
framework: Fix issue when restarting with multiple threads as it might only use one thread.
Change xmalloc() to take additional parameters so it is easier to identify
what it is trying to allocate if not enough memory.
lifsieve/lifsievecl: 1.4
New sieve for x^x-y^y and x^x+y^y.
2.5.8 - April 3, 2024
srsieve2/srsieve2cl: version 1.8.4
If d = p^n for some integer n, then allow p as a factor of (k*b^n-c)/d.
Print k in the ABC/ABCD file even if 1 as llr requires it.
fkbnsieve: version 1.6.2
Remove even c or odd c depending upon k and b as half of the terms will be even.
2.5.7 - March 17, 2024
srsieve2/srsieve2cl: version 1.8.3
Add support for GRUs (generalized repunits) in the form (b^n-1)/(b-1).
2.5.6 - March 8, 2024
framework:
Add method that can be used to output additional details after sieving completes.
srsieve2/srsieve2cl: version 1.8.2
Output expected number of primes when sieving is complete.
2.5.5 - January 30, 2024
framework:
Added more output when using -H with OpenCL builds
fkbnsieve: version 1.6.1
Replaced x86 asm with Montgomery logic so it can run on ARM CPUs.
Fix factor validation.
srsieve2/srsieve2cl: version 1.8.1
Got GPU variant of abc(c) = 1 code with multiple sequences working.
Change -b to -F and how it is used because it makes more sense to adjust
giant steps than it does to adjust baby steps.
Added -c to force generic logic even if abs(c) = 1 logic is possible. This is
useful when the nubmer of sequences when running srsieve2cl is slower with the
abs(c) = 1 logic than with the generic logic.
Correctly handle removal of sequences when all n have a factor.
2.5.4 - September 20, 2023
framework:
Fix issue where small primes might not be tested because the initial worker
stops processing its chunk so that another worker can continue.
fkbnsieve: version 1.6
Remove x86 asm in factor validation.
srsieve2/srsieve2cl: version 1.7.7
Fix a performance issue when starting with -i with tens of thousands of sequences.
If you are sieving tens of thousands of sequences avoid using input files where
k's are not in ascending sequence. Performance for loading the sequences will
take a noticeable hit if the input is not sorted by ascending k.
Fix memory usage upon startup when searching for square free part of k.
Remove x86 asm use so that it can run on ARM factors.
Enforce generic sieving for k > 2^63.
2.5.3 - August 15, 2023
framework:
Fix removal rate calculation to exclude time used prior to the start of sieving.
Do not allow mixing of GPU and CPU threads due to difficulties in computing
factor removal rate.
Do not include CPU time when computing removal rate with the GPU.
srsieve2/srsieve2cl: version 1.7.5
Removed -C command line option as -g does the same time, but in a different way.
Sort sequences by ascending k in output files.
Improved speed when loading large numbers of sequences when using -i.
Fix issue where factors are ignored when d > 1.
2.5.2 - July 3, 2023
framework:
Fixed factor rate calculation to exclude time used prior to sieving.
gcwsieve/gcwsivecl: version 1.5.3
Fix bug where the second term is skipped when testing small primes.
srsieve2/srsieve2cl: version 1.7.4
Added -a to output algebraic factorizations and terminate without sieving.
Add support for 'p' at the end of the -n argument to limit n to primes
between min and max n.
2.5.1 - June 29, 2023
twinsieve: version 1.6.3
Fix issue when continuing a sieve from an input file.
xyyxsieve/xyyxsievecl: version 2.2
Fix issue when using sparse terms where terms are not counted correctly.
srsieve2/srsieve2cl: version 1.7.3
Refactored algebraic factorization code to avoid issues that plagued previous builds.
2.5.0 - June 22, 2023
framework:
Fix how max p tested is computed as it could be wrong when running many workers.
srsieve2/srsieve2cl: version 1.7.2
Reverted part of the change for algebraic factorizations as it excluded too many.
Fix issue when using Legendre tables where factors would be missed.
twinsieve: version 1.6.2
Fixed issue with base 2 where even k are factored even though even k are not used.
The same issue can occur with odd bases.
xyyxsieve/xyyxsievecl: version 2.1
Added -Z to allow user to force sparse logic, which can be faster or slower
depending upon the density of x and y.
Fixed an issue when starting with -i that can lead to a crash.
2.4.9 - June 7, 2023
framework:
Added command line switch -6. When the removal rate is computed as "seconds
per factor", this is the number of minutes to includes in that rate. It can
exceed that number of minutes, but only if no factors are found.
fnbcsieve: version 1.7.1
For base 2 do not adjust mink or maxk when they are even. End user can use -r
to remove those terms.
srsieve2/srsieve2cl: version 1.7.1
Fix two issues with generating Legendre tables that can cause program to crash.
Fix issue where some sequences are removed due to incorrect algebraic factorizations.
2.4.8 - June 1, 2023
fbncsieve: version 1.7
No longer remove even k for base 2.
Ensure that odd bases only report even k that are removed.
Ignore factors for terms where k is outside of specified range.
xyyxsieve/xyyxsievecl: version 2.0
If the range of x or the range of y is greater than the number of candidates,
then different logic is used to manage the candidates and iterate over them.
2.4.7 - May 24, 2023
framework:
Added command line switch -4. Sieving will stop when the factors per
second rate falls below the specified rate. The rates only only evaluated
when the factor rate is computed, which is once per minute.
Added command line switch -5. Sieving will stop when the seconds per
factor rate falls is above specified rate. The rates only only evaluated
when the factor rate is computed, which is once per minute.
fbncsieve: version 1.6.2:
If max p is overridden due to the ability to factor out all composite terms,
then all remaining prime terms will be written the the output file as opposed
to some being written to the <xx>_primes.txt file.
gcwsieve/gcwsievecl: version 1.5.2
Modify output format to gcw_gfn.txt and gcw_mersenne.txt to make more sense.
srsieve2/srsieve2cl: version 1.7
Can now support hundreds of thousands of sequences per execution.
Added command line switch -S to split the sequences into separate files by
best Q for tha sequence. The resulting files should then be run with -q to
ensure that it uses the best Q for that set of sequences.
The software only factors k*b^n+/-c, even if d > 1. When d > 1 a couple
of other checks are in play.
When sieving sequences (k*b^n+/-c)/d where d > 1, terms where d > factor and
gcd(d, factor) > 0 will be not be removed when (k*b^n+/-c) mod factor = 0
because there is no easy way to verify that if factor divides (k*b^n+/-c)
that it also divides (k*b^n+/-c)/d.
Added command line switch -r for use with (k*b^n+/-c)/d sequences where d > 1.
When specified this will remove terms where k*b^n+/-c mod d != 0. If you do not
use -r then pfgw will use floor((k*b^n+/-c)/d) before doing the PRP test.
2.4.6 - April 14, 2023
framework:
Support 'f' or 'F' at the end of the -w argument. This will "fix" the
number of primes per CPU workunit and not resize the workunit.
twinsieve: version 1.6.1
Do not apply -r logic to base 2 since even k are already excluded.
fbncsieve: version 1.6.1
Fix issue when generating ABCD file as it counts terms incorrectly.
2.4.5 - March 24, 2023
framework:
Replace vsprintf with vsnprintf.
srsieve2/srsieve2cl: version 1.6.9
Fix an issue that occurs when logging factors and using multiple threads.
gcwsieve/gcwsievecl: version 1.5.1
Log terms of GFN or Mersenne forms as they are removed.
fbncsieve: version 1.6
Implement different logic (which is 5x slower) for larger bsaes to avoid invalid factors.
Only verify first factor for the first k for each prime.
Reduce memory usage for odd bases since we only track even k.
Reduce memory usage for base 2 since we only track odd k.
Output primes to a separate file.
twinsieve: version 1.6
Implement different logic (which is 5x slower) for larger bsaes to avoid invalid
factors. This only applies to b^n forms.
Add support to sieve for factorial/primorial twins.
Reduce memory usage for odd bases since we only track even k.
Reduce memory usage for base 2 since we only track odd k.
Only verify first factor for the first k for each prime.
ccsieve: version 1.2
Implement different logic (which is up to 2x faster) for b^n forms.
2.4.4 - March 14, 2023
framework:
Fixed calculation in GetCurrentMicrosecond() as it can lead to work sizes of 0.
pixsieve/pixsievecl:
Build with -O2 on Windows as the CPU code misses factors when built with -O3.
2.4.3 - February 27, 2023
framework:
Notify workers when the "primes per chunk" changes as the worker might need
to allocate/re-allocate memory based upon that.
psieve/psievecl: version 1.6
Ignore primorials < 100 and primorials > 1e9 if starting from an input file
as values outside of the supported range lead to missing factors.
twinsieve: version 1.5
Handle changes to "primes per chunk".
fbncsieve: version 1.5
Handle changes to "primes per chunk".
2.4.2 - January 19, 2023
framework:
For all OpenCL programs built on the framework except srsieve2cl, fix issue
where the buffer of primes for the GPU is not full. In some cases this can
cause a crash. In others it can lead to missing factors.
Replaced sprintf with snprintf in all sources and headers.
mfsievecl: version 2.2
Fix issue in kernel with missing factors.
Fix issue in kernel with invalid factors.
2.4.1 - January 13, 2023
twinsieve: version 1.4
Fix issue where no terms are removed.
gfndsieve/gfndsievecl: version 2.3
For each prime, only verify the first k is factored for that prime.
Avoid concurrency issues with term removal at same time output file is being written.
ccsieve: version 1.1
Added -N to allow splitting of k across multiple output files.
Fixed resuming from previous sieve (CC format).
For type 1 searches, only track odd k for base 2 and even k for odd bases.
Avoid concurrency issues with term removal at same time output file is being written.
2.4.0 - January 02, 2023
xyyxsieve/xyysievecl: version 1.8.1
Changed -D to -V for disabling AVX as -D is already used.
srsieve2/srsieve2cl: version 1.6.8
Fix issue where a single abs(c)=1 sequence finds an invalid factor. This
is caused by code that an unnecessary cast of a 32-bit value to a 16-bit value.
ccsieve: version 1.0
Initial release. This is sieve that can be used to assist in Cunningham Chain
searches. There are some differences from newpgen including:
ccsieve is slower then newpgen for primorials (because it validates factors).
ccsieve supports larger files.
ccsieve supports factorial type Cunningham Chains.
ccsieve can generate the CC file format. This is a new format supported
by pfgw which is used specifically for Cunningham Chains. This format
is far easier to read than the newpgen format.
These are the parameters specific to ccsieve:
-c 1 = first kind (first term end with -1), 2 = second kind (first term end with +1)
-t 1 = b^n, 2 = primorial, 3 = factorial
-k Minimum k to search
-K Maximum k to search
-l Cunningham Chain length
-b Base for b^n type searches
-n n of b^n, n# for primorial, n! for factorial
-f Format of output file (C=CC (default), N=NEWPGEN)
The maximum chain length supported by ccsieve is 30, although the longest chain
you are likely to find with help from ccsieve is 10 as the first k in the chain
is limited to 62 bits and records for longer chains have much larger k.
When using the CC format, pfgw can support longer Cunningham Chains up with up to 30
terms. pfgw will not test past the fifth term when using the newpgen format.
The newpgen output format support has not been tested.
Cunningham Chain records can be found at:
https://primes.utm.edu/top20/page.php?id=19
https://primes.utm.edu/top20/page.php?id=20
https://www.pzktupel.de/JensKruseAndersen/cc.htm
2.3.9 - December 17, 2022
srsieve2/srsieve2cl: version 1.6.7
Fix segfault with multiple c=1 sequences when using Legendre tables.
Fix logic behind -q and -Q for c=1 sequences.
2.3.8 - December 12, 2022
mfsieve/mfsievecl: version 2.1
Build a list of terms with powers prior to sieving so that computing minn! is faster.
For factorial, this improves the calculation of minn! by 30% with minn=1e6 when using
the GPU and by 40% when using the CPU.
Multi-factorials where minn < 1e6 will see less of a boost in performance.
srsieve2/srsieve2cl: version 1.6.6
Added -Q which will output estimated work for each possible q.
Added -q which can be used to specify the q to use (if that q is possible),
overriding the computed best q.
To use -Q, first sieve your sequence(s) to at least 1e6. This will ensure that
subsequent runs are using the correct sieving subroutines. Starting with the
output file, run that output file with the -Q flag then stop immediately after
it outputs a group of lines looking like this:
q = 45 with 162 subseq yields bs = 445, gs = 2, work = 793
work is an estimated cost for that q with lower costs implying higher throughput.
To use -q, take each of the q values output from -Q and run a range of at least
1e9 to determine the actual amount of time it takes for that q. Using the
Run a range of at least 1e9 using -q to specify which q to run by selecting
the q output from using the -Q flag. Although you can run all q, you can limit
to those q withing 20% of the lowest cost. For each run observe the total time
for that run to determine which q required the least amount of time. You should
then run the entire range with that q. This will not necessarily be the q with
the lowest cost.
2.3.7 - November 29, 2022
gcwsieve/gcwsievecl: version 1.5
Added support for non-x86 CPUs. FPU or AVX is still used on x86 CPUs.
Added -A to enable AVX on x86 CPUs. AVX code can be faster than the FPU,
but you will have to test ranges (for p > max n) to see which is faster.
Updated invmod method in the GPU and FPU code to gain about 2%.
2.3.6 - November 28, 2022
cksieve/cksievecl: version 1.4
Initial release of cksievecl.
cksieve will now run on non-x86 CPUs. It is 25% faster than the previous version.
cksievecl is about 5x faster than cksieve when comparing i9-11950H vs NVIDIA RTX A5000
2.3.5 - November 22, 2022
framework:
Added code to destuctors to free allocated memory.
Performance updates from Seth Troisi for the HashTable class
srsieve2/srieve2cl: version 1.6.5
Fixed HashTable usage (thanks to Seth Troisi) to get up to 10% better performance.
2.3.4 - October 10, 2022
gfndsieve/gfndsievecl: version 2.2
Always lock when reading/writing terms counter so that new factors cannot be applied
while the terms file is being written.
srsieve2/srsieve2cl: verison 1.6.4
Added -b. This parameter is used to calculate the number of baby steps and giant steps
for the discrete log. The default is 1.0. As this value increases, so does the number
of baby steps while decreasing the number of giant steps.
The parameter has an impact on CL_KERNEL_PRIVATE_MEM_SIZE in GPU kernels (output when
using -H). If that value is too high, then srsieve2cl will terminate. The reasons
for termination vary depending on CPU.
An example of usage would be when you are sieving thousands of sequences and have to use -K.
In one test I had to use -K5 (the GPU has about 4 GB of memory). When using -K3 -b0.2 there
is a nearly 40% performance boost. I am unaware (at this time) of how to detect the maximum
amount of memory available to a kernel so there is no way to auto-set -K and -b to the
optimial values at runtime. It will require trial and error for the end user.
The impact of changing from the default value has not been determined for CPU workers.
2.3.3 - July 7, 2022
framework:
Updated primesieve to 7.9. This addresses an issue in primesieve that can crash
the mtsieve applications.
Changed usage of primesieve (thanks to hints from Kim Wallisch, its creator) to
improve sieving peformance. As a result the worker threads now use an array instead
of a vector.
Broke the main sieving loop into two loops, one for when limited to a single thread
or for sieving prior to switching to the GPU. The second loop handles multiple worker
threads much better.
Reduced the initial CPU worksize from 1e6 to 96e3. When the largest prime tested
reaches 1e6, the worker thread can automatically adjust the worksize in an effort
to have each chunk take between 1 and 5 seconds to process. This has two affects.
First, it will allow the CPU workers to stop more quickly when the user hits ^C.
Second, it will do a better job at keeping CPU workers busy when using mulitple workers.
fkbnsieve: version 1.5
Only verify the first few factors found for each prime to speed up initial sieve.
sgsieve: version 1.3
Changed to support p/2p+1 where p is of the form k*b^n-1. This makes it compatible
with newpgen. As of 1.2 p was of the form k*b^n+1.
srsieve2: vesion 1.6.3
Default to not use Legendre tables for multiple sequences unless -l is used.
2.3.2 - June 3, 2022
Switched to llvm-mingw compiler in Windows. I have tried three other compilers, I cannot
use gdb with the latest msys2 or mingw64 compilers. gdb gives an error when I start the
programs, but only if they are built with -g. The latest cygwin compiler fails to
link the code. Others have seen the same error I have seen with other sofwtware and
nobody has provided a solution.
There were some spurious reports of bugs with the 2.3.0/2.3.1 release. This bugs
seem to have gone away now that I switched to llvm-mingw.
framework:
Updated primesieve to 7.8.
All remaining SSE code has been removed due to a bug resetting the rounding mode.
kbbsieve was the only program using it and is now faster without it.
Created abstract Device and Kernel classes and extend OpenCL and Metal classes
from those. This means that none of the GpuWorker classes need to include
or need to rely on logic specific to OpenCL or Metal as the implementation is
hidden from those workers.
kbbsieve: version 1.1
Switched to Montgomery logic. This is about 20% faster than the SSE logic.
2.3.1 - May 10, 2022
Removed the embedded ASM logic for NVIDIA from all GPU kernels. The OpenCL compiler
generates code that is about 10% faster. This is likely due to hard-coded register
usage in the ASM.
srsieve2, srsieve2cl: version 1.6.2
Add support for -C to generic sequence sieving which should be faster when one
has few sequences to test.
Change -S to -K so that less guessing is needed when one requires multiple GPU
kernels due to having a lot of sequences. The program will now create groups of
sequences that are approximately the same size for each call to the kernel.
2.3.0 - April 22, 2022
framework:
Updated to msys2 gcc version 11.2.0, which required changing string to std:string
and vector to std::vector instead of using the std namespace.
Due to a compiler bug, now compiling xyyxsieve and afsieve with -O2 instead of -O3.
Refactored the OpenCL implementation so that Metal implementation can use the same
interface as OpenCL. As a result KernelArgument.cpp no longer exists. The Kernel
now manages all CPU and GPU memory used by the Kernel.
Added Metal support (for Apple hardware) since Apple has deprecated OpenCL on their
hardware. As a result, all sieves that can use OpenCL on Apple hardware can now use
Metal on that same hardware.
Added ARM support for sieves that do not require x86 ASM functions.
Updated factor validation of mfsieve, gcwsieve, etc. to no longer require x86 asm.
Moved cltoh.pl to the main directory. make will now build the xxx.gpu.h file used
by the GpuWorkers as part of compiling the GpuWorker object files.
afsieve, afsievecl: version 1.2:
The GPU code uses the Montgomery logic for the mulmod.
All factors, found by either CPU or GPU are now validated.
psieve, psievecl: version 1.5
Replaced x86 ASM FPU code with Montgomery logic for the mulmod as the x86 ASM FPU
code was missing factors. There is no speed difference between the two. The x86
AVX code worked and has not been changed.
srsieve2, srsieve2cl: version 1.6.1
Changed default value for -g from to to 16. GPUs tend to like powers of 2 due
to their architecture.
Added -C for single sequence sieving with the GPU. This reduces threading overhead
which has a noticable impact on sieving speed. Using -C5 can improve speed by 50%.
2.2.3 - January 03, 2022
framework:
Modified sieving and factor rate calculations to account for long-running sieving
processes which lead to those rates fluctuating a lot.
Added xmallocNew() which can return NULL if not enough memory can be allocated
which allows callers to decide how to handle that condition.
For OpenCL sievers, added a call to clFinish() immediately after the call to
clEnqueueNDRangeKernel() so that the GPU time is more accurately calculated.
fkbnsieve: version 1.4
Do not create output file if there are no terms to write to it.
Exit sieving early if there are no remaining terms.
smsieve, smsievecl: version 1.0
Initial release for sieving of Smarandache numbers between Sm(100000) and Sm(9999999).
Cannot be used at this time for sieving p < 10000, but that is okay since a project
has a pre-sieved input file to use. It is impractical at this time to sieve values
larger than Sm(9999999) due to the time needed for a PRP test with pfgw.
srsieve2, srsieve2cl: version 1.6.0
Changed the calculation of factor density so that the GPU workers can use a
lot less GPU memory.
Added -S to split sequences into groups when feeding the GPU.
The following only apply to the abs(c) = 1 logic:
Added sr2sieve logic which uses Legendre checks in support of sieving when there
are multiple sequences.
Modified -L to represent a directory to hold a back up of the Legendre tables.
There will be one file per unique b/k/c.
Modified -l to specify the amount of memory to allocate to Legendre tsbles. -l0
disables use of Legendre tables.
Added -U to specify a multiplier to compute BASE_MULTIPLE. The specified value
is multiplied by 2 to compute BASE_MULTIPLE.
Added -V to specify a multiplier to compute POWER_RESIDUE_LCM. The specified value
is multiplied by BASE_MULTIPLE to compute POWER_RESIDUE_LCM.
Added -X to specify a multiplier to compute LIMIT_BASE. The specified value
is multiplied by POWER_RESIDUE_LCM to compute LIMIT_BASE.
BASE_MULTIPLE: least exponent Q for which sieving in subsequence base b^Q will be allowed.
POWER_RESIDUE_LCM: For a prime p that satisfies p=1 (mod r), an "r-th power residue test"
checks whether a subsequence of k*b^n+c can possibly contain any terms of
the form x^r (mod p). If there are none then that subsequence can be
omitted from the BSGS step.
To conduct r-th power residue tests for each r in a set R of prime
powers, set POWER_RESIDUE_LCM to lcm(R).
LIMIT_BASE: Allow sieving in base b^Q for Q chosen from the divisors of LIMIT_BASE.
When sieving with a single sequence the defaults are:
bmMultiplier = 15, prlMultiplier = 24, lbMultiplier = 1
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
When sieving with multiple sequences the defaults are:
bmMultiplier = 1, prlMultiplier = 360, lbMultiplier = 1
BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
It is recommend to play with these multipliers to find the optimal values as the provided
defaults are from sr1sieve and sr2sieve and some values can provide faster sieving.
The recommendation is to sieve a range of 1e9 and see how long it takes (use srsieve2.log
to see how long it actually takes). Vary -U, -V, and-X then evaluate the results. With a
script and multiple cores, this could done relatively quickly for most inputs.
2.2.2 - October 22, 2021
framework:
Added __attribute__ to method declarations that accept variable arguments.
srsieve2, srsieve2cl: version 1.5.3
Modified to not remove terms that are prime as that defeats the purpose of
Sierpinski/Riesel searches.
Fixed bug where maxn for a sequence has a small factor, but it is not found.
gnfdsieve, gfndsievecl: version 2.1
Fixed bug where code can find invalid factors.
2.2.1 - February 16, 2021
framework: no changes
gfndsieve, gfndsievecl: version 2.0
Moved GFN divisor testing to GFNDivisorTester class so that GFNDivisorApp is smaller
and so that future support of non-x86 is easier since GFNDivisorTester
calls a number of x86 asm methods directly.
For gfndsievecl do not report any terms with factors < 50. This reduces the size
needed for the buffer that is used to report factors.
Added -r and -R options to support functionality similar to ppsieve.
-r will not generate a bitmap for tracking terms. It will only generate an
output file of factors.
-R is used with -r. If a term has a factor below 32767 (the default value),
then the program will not output any factors for the term.
-r and -x are mutually exclusive with -r overriding -x.
Added various speed improvements.
srsieve2, srsieve2cl: version 1.5.2
Fixed issue with CisOne logic as it tries to rebuild sequences when there are
multiple sequences as that is not yet supported.
2.2.0 - February 9, 2021
framework:
Updated OpenCL on Windows. See makefile for details.
Updated primesieve to 7.6.
psieve, psievecl: version 1.4
Some refactoring to support OpenCL worker.
First release of psievecl.
Verify factors from -I input file
srsieve2, srsieve2cl: version 1.5.1
Fixed bug that was introduced in the refactoring of 1.5 that impacts generic sieving
while using multiple threads.
Added -R to remove sequences. Use -Rk*b^n+c format to remove a single sequence or
use -R with a file that has multiple sequences. This is not tested yet.
2.1.6 - February 3, 2021
framework:
Add largestPrimeTested parameter to NotifyAppToRebuild() as the app cannot rely
on accurately determining that value.
srsieve2, srsieve2cl: version 1.5
Fixed remaining known issues with CisOne logic (sequences where abs(c) = 1) for
a single CisOne sequence (sr1sieve).
Added OpenCL code for CisOne logic.
Added Legendre table lookups for CisOne logic.
2.1.5 - January 26, 2021
framework:
Added MpArith.h (non-vectorized) and changed class names in MpArithVector.h.
Overloaded HashTable constructor as needed for srsieve2.
srsieve2, srsieve2cl: version 1.4
Lots of refactoring to support special sieving logic for c=1 sequences.
Implemented sr1sieve logic using Montgomery mulmod logic (CPU only).
Change array of sequences to a linked list to avoid compiler warnings.
Add support for pmin= line in input file (as generated by srsieve/srfile).
2.1.4 - January 9, 2021
framework:
Fixed an issue with creating GPU kernels on OS X.
srseive2cl: new release
Finally an OpenCL version of srsieve2. srsieve2cl is at least 3x faster than srsieve2,
On my GPU it is limited to about 5000 sequences due to GPU memory limitations. I do not
know what the limits are for other GPUs. It will switch to the GPU at p>1e6.
2.1.3 - January 6, 2021
framework:
Improve determination of "largest prime" tested for per minute stats by ignoring
workers that haven't done any work.
dmdsieve: version 1.3
Added working factor validation logic with -I.
Modify factor validation logic to only verify the first 5 factors for each small
prime. If there is a problem it will reveal itself immediately. This improves
the speed when starting a new sieve.
gnfdsieve, gfndsievecl: version 1.9
Added working factor validation logic with -I.
Modify factor validation logic to only verify the first 5 factors for each small
prime. If there is a problem it will reveal itself immediately. This improves
the speed when starting a new sieve.
Allow the GPU to start sieving at (kMax-kMin)/2 instead of kMax.
Switched to Montgomery mulitplication in the GPU.
Changed GPU code to handle "too many factors" similar to other GPU sievers in the
framework as opposed to crashing if not enough GPU memory.
GPU code is about 9x faster than CPU code, but GPU code is only of value if one
needs to sieve more deeply than (kMax-kMin)/2, which is somwhere over n=1000.
mfsieve: version 1.9
Used vectorized Montgomery logic to get a 1% to 5% speed boost.
srsieve2: version 1.3.1
Modify factor validation logic to only verify the first 5 factors for each small
prime. If there is a problem it will reveal itself immediately. This improves
the speed when starting a new sieve.
2.1.2 - December 30, 2020
framework:
Retain factor counts when workers are rebuilt.
srsieve2: version 1.3
Added factor validation logic when using -I.
Switched to Montgomery mulitplication, which is about 20% faster than the previous
logic. This also gives a 20% speed bump and adds support for p up to 2^62.
2.1.1 - December 29, 2020
framework:
Fixed an infinite loop that occurs with the special CPU worker in GPU builds.
Removed the factor validation logic added in 2.1.0 do to an issue with how I
implemented it.
gcwsieve, gcwsievecl: version 1.5
Added working factor validation logic with -I.
Improved GPU speed by another 10%. Thanks to Yves Gallot for the code.
mfsieve, mfsievecl: version 1.8
Added working factor validation logic with -I.
Fix bug triggering invalid factor in the CPU code when minN is odd with factorials.
2.1.0 - December 27, 2020
framework:
On Windows, switched to using gcc from msys2 instead of gcc from mingw64. -O3
gives a nice performance bump to some of the non-ASM code.
Exit after showing help when using -h swtich instead of giving fatal error.
Add logic to support validation of factors passed with -I, but not all sieves are
coded to do this validation.
On GPU builds, default -W to 0 and -G to 1. A "special" CPU worker will be created
as necessary to test ranges of p that are to small for the GPU code.
For GPU executables, add -H to show some GPU details when each kernel is created.
Improve factor rate calculation when using GPU workers.
Created FUTURE.txt for items I would like to get working.
cksieve: version 1.3
Added logic to validate factors passed with -I.
gcwsieve, gcwsievecl: version 1.4
Added logic to validate factors passed with -I.
Add -f option to gcwsieve so that it can generate ABC output that is supported by LLR.
Added -M option and fixed -S for gcwsievecl.
Improved speed of GPU code of gcwsievecl by about 25%.
The GPU code is more than 20x faster than the CPU for the same amount of work.
mfsieve, mfsievecl: version 1.7
Added logic to validate factors passed with -I.
Replaced all ASM routines with non-ASM routines based upon Montgomery mulitplication. This
provides a speed bump of at least 30% over the previous release.
Fixed the GPU code and replaced magic number logic with Montgomery multiplcation as it is
faster. The new GPU code is more then 25% faster than the old GPU code.
The GPU code is more than 20x faster than the CPU for the same amount of work.
sgsieve: version 1.2
Added logic to validate factors passed with -I.
twinsieve: version 1.3
Added logic to validate factors passed with -I.
xyyxsieve, xyyxsievecl: version 1.8
Added logic to validate factors passed with -I.
2.0.6 - September 8, 2020
Fixed crash with xyyxsievecl with overly large ranges for the GPU.
More speed enhancements for xyyxsieve, mainly due to reduced memory requirements.
Do not show ETA or percent done if less than 1% done and -P not used.
Modify factor rate calculation to handle uneven factor distribution better.
Fix srsieve2 as it was counting factors of terms that had already been removed.
2.0.5 - July 13, 2020
Added support for newpgen format to sgsieve.
2.0.4 - July 10, 2020
Fixed an issue with factor rate calculation if all factors are found in the
first minute and none after.
Added sgsieve for Sophie-Germain searches.
2.0.3 - June 13, 2020
Fixed an issue where sieving would stop after rebuilding workers.
2.0.2 - June 8, 2020
Ensure that -p always overrides "last prime" from input file.
Fixed fkbnsieve as it can sometimes miss factors for small p.
Fixed issues with k1b2sieve.
Fixed ^C handling as sometimes sieving wouldn't stop.
2.0.1 - June 1, 2020
Added k1b2sieve to sieve 2^n+c for a range of n and c.
Fixed runtime issue with gcwsieve.
2.0.0 - May 20, 2020
Fixed an issue with the framework where it could stop sieving before reaching the max
prime specified on the command line.
Fixed a link issue with gfndsieve and dmdsieve with GMP on OS X.
Fixed a compiler issue with the .align ASM instruction on OS X.
1.9.8 - May 18, 2020
Fixed issue in gfndsieve where it accesses memory beyond the bounds of a vector.
1.9.7 - January 15, 2020
Fixed issue in srsieve2 where it crashes when starting a new sieve.
Fixed issue in srsieve2 where it finds an invalid factor and terminates.
1.9.6 - January 10, 2020
Refactored logic for "single worker" and "rebuild" logic so they work more nicely together.
Fixed calculation of p/sec which is wrong after workers are recreated.
Reset factor stats whenever workers are created as the rate is inflated for lower p as
far more terms are removed.
Fix issue with multiple threads which can cause program to hang.
Add support for ABC format to srsieve2.
1.9.5 - August 19, 2019
Fixed errors factor rate calculation.
1.9.4 - August 9, 2019
Fix a segfault with srsieve2 when rebuilding subsequences.
Fix a segfault with AVX logic on non-Windows OSes.
Fixed an issue with gfndsieve and dmdsieve where a large range of k and small range of x
cause it to not sieve subsequent subranges.
Added a table to gfndsieve to pre-set pmax if using -x and -P is not specified.
Add -D flag to xyyxsieve to disable AVX as it crashes in the AVX routines on Windows OS
on AMD CPUs. The flag will be removed when that issue is resolved.
Added -fP to srsieve2. This will generate a more verbose ABC output file with each line
specificing the values of k, n, and c. It will also add the number_primes switch which is
supported by pfgw to ensure only a single prime for each distinct k of the file.
Added -fB to srsieve2. This will generate a format that can be used to load BOINC. This
is equivalent to the "PRP" format used by srsieve/srfile
1.9.3 - July 25, 2019
Re-implemented a single makefile. It will now build separate exes for OpenCL enabled binaries.
Fix ABC format line in xyyxsieve output file.
Adjust reported factoring rate to account for less than 1 full CPU or more than 1 full CPU.
Handle race condition that can trigger an infinite loop.
Allow the first chunk to be smaller so that a rebuild of terms can be triggered earlier.
srsieve2 will compile, but requires more testing and only has srsieve functions.
1.9.2 - June 17, 2019
Split makefile so only GPU-enabled code will compile and link with OpenCL libraries
with intention of a single makefile in the future as having two nearly identical
makefiles is difficult to maintain.
Made changes to the framework to support sieving in chunks as needed by gfndsieve
and dmdsieve.
Added -x and -X switches to gfndsieve. With these switches one can execute Fermat
divisibility checks after sieving.
Allow gfndsieve to sieve smaller n, but report if k*2^n+1 is a prime in the range
being sieved. It will not report if k*2^n+1 is prime when k*2^n+1 > maxp, but less than maxp^2.
Added -x and -X switches to dmdsieve. With these switches one can execute MMP
divisibility checks after sieving.
1.9.1 - May 31, 2019
Added srsieve2, which will eventually replace srsieve, sr2sieve, and sr1sieve. It has
internal logic to determine how to sieve the input sequences, thus is combined into a
single executable. It does not support Legendre table logic from sr2sieve and sr1sieve,
but should be faster than both when not using Legendre tables.
If ib_BlockWhenProcessingFirstChunk is set to true for the sieve, the factor rate will now
be reported if the first chunk requires more than 60 seconds to process.
Sieves now report intermediate results when processing chunks. This allows for more
accurate factor rate calculations when it takes a long time to process each chunk.
Continue to report status if sieving is done, but workers are still processing their chunk.
1.9.0 - May 21, 2019
Changes were made to the framework to support srsieve2 (which isn't ready yet). No other
sieves use thse features (yet).
Modify FactorApp in an effort to more accurately compute the factor rate. When starting
a new sieve, the factor rate would be inflated because it would calculate the rate based
upon all factors removed. This change will exclude factors removed in the the first 30
minutes that the program has run. This logic will be executed when fewer than 60 terms
have been removed in the previous hour.
Fixed an issue in fbncsieve and twinsieve where the output file would contain composites
that are divisible by p that had already been sieved.
1.8.5 - January 17, 2019
Upgrade from primesieve 6.2 to primesieve 7.3.
Change calculation of factor rate to be based upon CPU utilization instead of number of workers.
Fix to twinsieve to correctly remove terms when using -r when restarting a sieve.
1.8.4 - December 23, 2018
Added the new Mersenne Prime to dmdsieve.
Fixed an issue in twinsieve and fbncsieve where it can exit with an error when
testing primes above sqrt(max term).
Fixed an issue (impacting all sieves) in computing the factor rate. The issue will manifest itself by
outputting a large factor removal rate. It only occurs after running the sieve continuously for 5 days.
1.8.3 - October 7, 2018
Added dmdsieve. This is used to sieve for divisors of Double-Mersenne numbers.
Output terms from dmdsieve have form 2*k*M+1 where M is a Mersenne Prime.
Fixed twinsieve so that it properly handles input factor files.
1.8.2 - October 2, 2018
Fixed twinsieve to properly check for file format when using -s.
1.8.1 - September 27, 2018
Fixed a bug that causes all sieves to crash when trying to compute factor rate.
twinsieve has duplicate -i switch, changed to -s.
1.8.0 - September 25, 2018
Added twinsieve. This is more than 3x faster than newpgen's twin sieve.
Modified OpenCL code to change calculation for default workunits to improve GPU throughput.
Modified "start sieving" message to include expected factors, but only if -P is not the default value.
Modified all sieves to have custom "start sieving message" so it each show more detail specific to that sieve.
1.7.5 - August 14, 2018
Fixed a crash when reading multiple empty lines in a row from an input file.
Added -r option to fbncsieve to remove terms where k % base = 0.
Various updates for newpgen output from fbncsieve:
use the .npg extension instead of .pfgw extension
change third parameter of first line to 1 for srsieve/srfile compatibility
change last parameter of first line to 1/2 since 1/2 is used for fixed newpgen
sieves and 257/258 are used for fixed k sieves.
1.7.4 - August 10, 2018
Modify pixsieve to report primes.
Modify pixsieve to output search string to console and log when sieve starts.
Fixed a crash in xyyxsieve when sieving only one sign.
Generate default filename for mfsieve if not specified on the command line.
Fix issue in psieve if it finds a factor for the last term of the input.
mfsieve supports AVX and is about 40% faster than previously.
1.7.3 - August 3, 2018
Fixed a memory exception that affects GPU workers for all sieves.
Re-enable AVX support with psieve as accidentally disabled in 1.7.1.
Do not output factor rate if no factors found.
Fixed another issue in non-AVX psieve code that causes it to crash.
1.7.2 - July 27, 2018
Fixed issue with reading ABCD files as input lines with 1 character would be ignored.
Fixed a crash upon exit of fbncsieve.
Allow override with -p when starting fbncsieve and fkbnsieve from an input file.
1.7.1 - July 25, 2018
Fixed output for number of terms remaaining to support values > 2^32.
Modified psieve to support an input file created by fpsieve.
Fixed a bug in the non-AVX primorial ASM code that causes it to crash.
1.7.0 - July 4, 2018
Added a timestamp to lines written to the log.
Changed usage of some registers in AVX code to avoid ymm0-ymm3 being passed
between calls to AVX routines.
Added psieve for primorials.
psieve supports AVX and is about 30% faster than fpsieve.
1.6.0 - June 25, 2018
Fixed an error with factor rate calculation when less than 1 per second.
Fixed an issue with gfndsieve when continuing a sieve and k < n.
For kbbsieve, added some checks for algebraic factorizations.
Added gcwsieve for Cullens and Woodalls. This sieve is GPU enabled.
Renamed all ASM routines to easily distinguish FPU/SSE/AVX.
Added AVX asm code for use by the Worker classes.
Added a mini-chunk mode that can be used when the worker classes
handles primes in chunks, such as AVX mode, which is chunks of 16 primes.
gcwsieve supports AVX. The CPU-only code is about 30% faster than Geoff Reynold's version.
xyyxsieve supports AVX. The CPU-only code is about 2.5x faster than the previous version.
1.5.0 - April 10, 2018
kbbsieve is more fully tested. It now uses a powmod that is limited to
52 bits (switching from extended floating point to SSE), which should be
at least 10% faster.
1.4.0 - April 9, 2018
Some common functionality for GPU sieving has been moved to Worker.cpp.
All GPU workers validate factors found by the GPU.
The xyyxsieve GPU sieving issue has been resolved.
The pixsieve GPU sieving code has been tested.
GPU sieving has been added to mfsieve. It has been tested.
GPU sieving has been added to gfndsieve. It has been tested.
Add kbbsieve, for the form k*b^b+/-1 for fixed k and variable b. It has been
partially tested.
1.3.0 - March 11, 2018
Ensure that "ENABLE_GPU=no" in makefile builds all programs without error.
cksieve no longer gives a fatal error if the computed root is not an actual
root. This condition rarely happens, but is okay when it does.
Overriding -p from the command line should now work when starting with
an input file.
Added GPU workers to xyyxsieve. When using GPU workers, an overflow with
collecting factors can cause xyyxsieve to crash. If that happens override
-S and/or -g or sieve more deeply with the CPU before adding GPU workers.
This will be addressed in a future release.
Added GPU workers to pixsieve. It has not been tested yet.
1.2.0 - February 23, 2018
fkbnsieve is now working.
Modify cksieve to detect candidates that are prime and to log them.
Fixed an asm bug that at worst causes factors to be missed by
fbncsieve and gfndsieve. It will nor result in invalid factors and
if it did, they would be caught at runtime due to built-in factor
checking that relies on completely different code.
Added -A option to apply factors (or reformat candidate file) and
exit immdiately without sieving.
Added GPU classes. This adds the following command line options:
-D - to select the GPU platform
-d - to select the GPU device
-G - to specify the number of GPU workers
-g - to set multiple of workgroupsize which is used to compute
the number of primes per GPU worker
Added GPU workers to afsieve.
1.1.0 - February 21, 2018
Add an internal flag that guarantee that suspends all but one Worker when
processing the first chunk of primes. This is used to improve performance
when there is a high factor density for low primes. This will also suppress
any on screen reporting or checkpointing until that chunk is processed.
Fix issue in computing CPU utilization.
Changed -c (chunksize) option to -w (worksize).
Change output to use shorter notation for min and max primes.