CHANGES.txt

2.6.0 - May 6, 2024
   framework:  Fix issue when restarting with multiple threads as it might only use one thread.
               Change xmalloc() to take additional parameters so it is easier to identify
               what it is trying to allocate if not enough memory.
   
   lifsieve/lifsievecl: 1.4
      New sieve for x^x-y^y and x^x+y^y.

2.5.8 - April 3, 2024
   srsieve2/srsieve2cl: version 1.8.4
      If d = p^n for some integer n, then allow p as a factor of (k*b^n-c)/d.
      Print k in the ABC/ABCD file even if 1 as llr requires it.
   
   fkbnsieve: version 1.6.2
      Remove even c or odd c depending upon k and b as half of the terms will be even.
      
2.5.7 - March 17, 2024
   srsieve2/srsieve2cl: version 1.8.3
      Add support for GRUs (generalized repunits) in the form (b^n-1)/(b-1).

2.5.6 - March 8, 2024
   framework:
      Add method that can be used to output additional details after sieving completes.
   
   srsieve2/srsieve2cl: version 1.8.2
      Output expected number of primes when sieving is complete.

2.5.5 - January 30, 2024
   framework:
      Added more output when using -H with OpenCL builds
      
   fkbnsieve: version 1.6.1
      Replaced x86 asm with Montgomery logic so it can run on ARM CPUs.
      Fix factor validation.

   srsieve2/srsieve2cl: version 1.8.1
      Got GPU variant of abc(c) = 1 code with multiple sequences working.
      Change -b to -F and how it is used because it makes more sense to adjust
      giant steps than it does to adjust baby steps.
      Added -c to force generic logic even if abs(c) = 1 logic is possible.  This is
      useful when the nubmer of sequences when running srsieve2cl is slower with the
      abs(c) = 1 logic than with the generic logic.
      Correctly handle removal of sequences when all n have a factor.
   
2.5.4 - September 20, 2023
   framework:
      Fix issue where small primes might not be tested because the initial worker
      stops processing its chunk so that another worker can continue.

   fkbnsieve: version 1.6
      Remove x86 asm in factor validation.

   srsieve2/srsieve2cl: version 1.7.7
      Fix a performance issue when starting with -i with tens of thousands of sequences.
      If you are sieving tens of thousands of sequences avoid using input files where
      k's are not in ascending sequence.  Performance for loading the sequences will
      take a noticeable hit if the input is not sorted by ascending k.
      Fix memory usage upon startup when searching for square free part of k.
      Remove x86 asm use so that it can run on ARM factors.
      Enforce generic sieving for k > 2^63.
      
2.5.3 - August 15, 2023
   framework:
      Fix removal rate calculation to exclude time used prior to the start of sieving.
      Do not allow mixing of GPU and CPU threads due to difficulties in computing
      factor removal rate.
      Do not include CPU time when computing removal rate with the GPU. 

   srsieve2/srsieve2cl: version 1.7.5
      Removed -C command line option as -g does the same time, but in a different way.
      Sort sequences by ascending k in output files.
      Improved speed when loading large numbers of sequences when using -i.
      Fix issue where factors are ignored when d > 1.

2.5.2 - July 3, 2023
   framework:
      Fixed factor rate calculation to exclude time used prior to sieving.
      
   gcwsieve/gcwsivecl: version 1.5.3
      Fix bug where the second term is skipped when testing small primes.

   srsieve2/srsieve2cl: version 1.7.4
      Added -a to output algebraic factorizations and terminate without sieving.
      Add support for 'p' at the end of the -n argument to limit n to primes
      between min and max n.

2.5.1 - June 29, 2023
   twinsieve: version 1.6.3
      Fix issue when continuing a sieve from an input file.

   xyyxsieve/xyyxsievecl: version 2.2
      Fix issue when using sparse terms where terms are not counted correctly.

   srsieve2/srsieve2cl: version 1.7.3
      Refactored algebraic factorization code to avoid issues that plagued previous builds.
      
2.5.0 - June 22, 2023
   framework:
      Fix how max p tested is computed as it could be wrong when running many workers.

   srsieve2/srsieve2cl: version 1.7.2
      Reverted part of the change for algebraic factorizations as it excluded too many.
      Fix issue when using Legendre tables where factors would be missed.

   twinsieve: version 1.6.2
      Fixed issue with base 2 where even k are factored even though even k are not used.
      The same issue can occur with odd bases.

   xyyxsieve/xyyxsievecl: version 2.1
      Added -Z to allow user to force sparse logic, which can be faster or slower
      depending upon the density of x and y.
      Fixed an issue when starting with -i that can lead to a crash.
      
2.4.9 - June 7, 2023
   framework:
      Added command line switch -6.  When the removal rate is computed as "seconds
      per factor", this is the number of minutes to includes in that rate.  It can
      exceed that number of minutes, but only if no factors are found.
      
   fnbcsieve: version 1.7.1
      For base 2 do not adjust mink or maxk when they are even.  End user can use -r
      to remove those terms.
      
   srsieve2/srsieve2cl: version 1.7.1
      Fix two issues with generating Legendre tables that can cause program to crash.
      Fix issue where some sequences are removed due to incorrect algebraic factorizations.

2.4.8 - June 1, 2023
   fbncsieve: version 1.7
      No longer remove even k for base 2.
      Ensure that odd bases only report even k that are removed.
      Ignore factors for terms where k is outside of specified range.
      
   xyyxsieve/xyyxsievecl: version 2.0
      If the range of x or the range of y is greater than the number of candidates,
      then different logic is used to manage the candidates and iterate over them.
      
2.4.7 - May 24, 2023
   framework:
      Added command line switch -4.  Sieving will stop when the factors per
      second rate falls below the specified rate.   The rates only only evaluated 
      when the factor rate is computed, which is once per minute.
      
      Added command line switch -5.  Sieving will stop when the seconds per
      factor rate falls is above specified rate.  The rates only only evaluated 
      when the factor rate is computed, which is once per minute.
   
   fbncsieve: version 1.6.2:
      If max p is overridden due to the ability to factor out all composite terms,
      then all remaining prime terms will be written the the output file as opposed
      to some being written to the <xx>_primes.txt file.
      
   gcwsieve/gcwsievecl: version 1.5.2
      Modify output format to gcw_gfn.txt and gcw_mersenne.txt to make more sense.

   srsieve2/srsieve2cl: version 1.7
      Can now support hundreds of thousands of sequences per execution.

      Added command line switch -S to split the sequences into separate files by
      best Q for tha sequence.  The resulting files should then be run with -q to
      ensure that it uses the best Q for that set of sequences.

      The software only factors k*b^n+/-c, even if d > 1.  When d > 1 a couple
      of other checks are in play.
      
      When sieving sequences (k*b^n+/-c)/d where d > 1, terms where d > factor and
      gcd(d, factor) > 0 will be not be removed when (k*b^n+/-c) mod factor = 0
      because there is no easy way to verify that if factor divides (k*b^n+/-c)
      that it also divides (k*b^n+/-c)/d.

      Added command line switch -r for use with (k*b^n+/-c)/d sequences where d > 1.
      When specified this will remove terms where k*b^n+/-c mod d != 0.  If you do not
      use -r then pfgw will use floor((k*b^n+/-c)/d) before doing the PRP test.

2.4.6 - April 14, 2023
   framework:
      Support 'f' or 'F' at the end of the -w argument.  This will "fix" the
      number of primes per CPU workunit and not resize the workunit.
      
   twinsieve: version 1.6.1
      Do not apply -r logic to base 2 since even k are already excluded.
      
   fbncsieve: version 1.6.1
      Fix issue when generating ABCD file as it counts terms incorrectly.
      
2.4.5 - March 24, 2023
   framework:
      Replace vsprintf with vsnprintf.
      
   srsieve2/srsieve2cl: version 1.6.9
      Fix an issue that occurs when logging factors and using multiple threads.
   
   gcwsieve/gcwsievecl: version 1.5.1
      Log terms of GFN or Mersenne forms as they are removed.

   fbncsieve: version 1.6
      Implement different logic (which is 5x slower) for larger bsaes to avoid invalid factors.
      Only verify first factor for the first k for each prime.
      Reduce memory usage for odd bases since we only track even k.
      Reduce memory usage for base 2 since we only track odd k.
      Output primes to a separate file.
   
   twinsieve: version 1.6
      Implement different logic (which is 5x slower) for larger bsaes to avoid invalid
      factors.  This only applies to b^n forms.
      Add support to sieve for factorial/primorial twins.
      Reduce memory usage for odd bases since we only track even k.
      Reduce memory usage for base 2 since we only track odd k.
      Only verify first factor for the first k for each prime.
      
   ccsieve: version 1.2
      Implement different logic (which is up to 2x faster) for b^n forms.
      
2.4.4 - March 14, 2023
   framework:
      Fixed calculation in GetCurrentMicrosecond() as it can lead to work sizes of 0.
   
   pixsieve/pixsievecl:
      Build with -O2 on Windows as the CPU code misses factors when built with -O3.
   
2.4.3 - February 27, 2023
   framework:
      Notify workers when the "primes per chunk" changes as the worker might need
      to allocate/re-allocate memory based upon that.
      
   psieve/psievecl: version 1.6
      Ignore primorials < 100 and primorials > 1e9 if starting from an input file
      as values outside of the supported range lead to missing factors.
   
   twinsieve: version 1.5
      Handle changes to "primes per chunk".
         
   fbncsieve: version 1.5
      Handle changes to "primes per chunk".
      
2.4.2 - January 19, 2023
   framework:
      For all OpenCL programs built on the framework except srsieve2cl, fix issue
      where the buffer of primes for the GPU is not full.  In some cases this can
      cause a crash.  In others it can lead to missing factors.
      
      Replaced sprintf with snprintf in all sources and headers.
      
   mfsievecl: version 2.2
      Fix issue in kernel with missing factors.
      Fix issue in kernel with invalid factors.
      
2.4.1 - January 13, 2023
   twinsieve: version 1.4
      Fix issue where no terms are removed.
   
   gfndsieve/gfndsievecl: version 2.3
      For each prime, only verify the first k is factored for that prime.
      Avoid concurrency issues with term removal at same time output file is being written.
      
   ccsieve: version 1.1
      Added -N to allow splitting of k across multiple output files.
      Fixed resuming from previous sieve (CC format).
      For type 1 searches, only track odd k for base 2 and even k for odd bases.
      Avoid concurrency issues with term removal at same time output file is being written.
      
2.4.0 - January 02, 2023
   xyyxsieve/xyysievecl: version 1.8.1
      Changed -D to -V for disabling AVX as -D is already used.
      
   srsieve2/srsieve2cl: version 1.6.8
      Fix issue where a single abs(c)=1 sequence finds an invalid factor.  This
      is caused by code that an unnecessary cast of a 32-bit value to a 16-bit value.
      
   ccsieve: version 1.0
      Initial release.  This is sieve that can be used to assist in Cunningham Chain
      searches.  There are some differences from newpgen including:
          ccsieve is slower then newpgen for primorials (because it validates factors).
          ccsieve supports larger files.
          ccsieve supports factorial type Cunningham Chains.
          ccsieve can generate the CC file format.  This is a new format supported
          by pfgw which is used specifically for Cunningham Chains.  This format
          is far easier to read than the newpgen format.

      These are the parameters specific to ccsieve:
         -c    1 = first kind (first term end with -1), 2 = second kind (first term end with +1)
         -t    1 = b^n, 2 = primorial, 3 = factorial
         -k    Minimum k to search
         -K    Maximum k to search
         -l    Cunningham Chain length
         -b    Base for b^n type searches
         -n    n of b^n, n# for primorial, n! for factorial
         -f    Format of output file (C=CC (default), N=NEWPGEN)
      
      The maximum chain length supported by ccsieve is 30, although the longest chain 
      you are likely to find with help from ccsieve is 10 as the first k in the chain
      is limited to 62 bits and records for longer chains have much larger k.
      
      When using the CC format, pfgw can support longer Cunningham Chains up with up to 30
      terms.  pfgw will not test past the fifth term when using the newpgen format.
      
      The newpgen output format support has not been tested.
      
      Cunningham Chain records can be found at:
         https://primes.utm.edu/top20/page.php?id=19
         https://primes.utm.edu/top20/page.php?id=20
         https://www.pzktupel.de/JensKruseAndersen/cc.htm

2.3.9 - December 17, 2022
   srsieve2/srsieve2cl: version 1.6.7
      Fix segfault with multiple c=1 sequences when using Legendre tables.
      Fix logic behind -q and -Q for c=1 sequences.

2.3.8 - December 12, 2022
   mfsieve/mfsievecl: version 2.1
      Build a list of terms with powers prior to sieving so that computing minn! is faster.
      For factorial, this improves the calculation of minn! by 30% with minn=1e6 when using
      the GPU and by 40% when using the CPU.
      Multi-factorials where minn < 1e6 will see less of a boost in performance.
      
   srsieve2/srsieve2cl: version 1.6.6   
      Added -Q which will output estimated work for each possible q.
      Added -q which can be used to specify the q to use (if that q is possible),
      overriding the computed best q.
      
      To use -Q, first sieve your sequence(s) to at least 1e6.  This will ensure that
      subsequent runs are using the correct sieving subroutines.  Starting with the
      output file, run that output file with the -Q flag then stop immediately after
      it outputs a group of lines looking like this:
      
      q =   45 with    162 subseq yields bs =   445, gs =     2, work =    793
      
      work is an estimated cost for that q with lower costs implying higher throughput.
      
      To use -q, take each of the q values output from -Q and run a range of at least
      1e9 to determine the actual amount of time it takes for that q.  Using the
      Run a range of at least 1e9 using -q to specify which q to run by selecting
      the q output from using the -Q flag.  Although you can run all q, you can limit
      to those q withing 20% of the lowest cost.  For each run observe the total time
      for that run to determine which q required the least amount of time.  You should
      then run the entire range with that q.  This will not necessarily be the q with
      the lowest cost.
      
2.3.7 - November 29, 2022
   gcwsieve/gcwsievecl: version 1.5
      Added support for non-x86 CPUs.  FPU or AVX is still used on x86 CPUs.
      Added -A to enable AVX on x86 CPUs.  AVX code can be faster than the FPU,
      but you will have to test ranges (for p > max n) to see which is faster.
      Updated invmod method in the GPU and FPU code to gain about 2%.

2.3.6 - November 28, 2022
   cksieve/cksievecl: version 1.4
      Initial release of cksievecl.
      cksieve will now run on non-x86 CPUs.  It is 25% faster than the previous version.
      cksievecl is about 5x faster than cksieve when comparing i9-11950H vs NVIDIA RTX A5000
      
2.3.5 - November 22, 2022
   framework:
      Added code to destuctors to free allocated memory.
      Performance updates from Seth Troisi for the HashTable class
      
   srsieve2/srieve2cl: version 1.6.5
      Fixed HashTable usage (thanks to Seth Troisi) to get up to 10% better performance.

2.3.4 - October 10, 2022
   gfndsieve/gfndsievecl: version 2.2
      Always lock when reading/writing terms counter so that new factors cannot be applied
      while the terms file is being written.

   srsieve2/srsieve2cl: verison 1.6.4
      Added -b.  This parameter is used to calculate the number of baby steps and giant steps
      for the discrete log.  The default is 1.0.  As this value increases, so does the number
      of baby steps while decreasing the number of giant steps.  

      The parameter has an impact on CL_KERNEL_PRIVATE_MEM_SIZE in GPU kernels (output when
      using -H).  If that value is too high, then srsieve2cl will terminate.  The reasons
      for termination vary depending on CPU.  

      An example of usage would be when you are sieving thousands of sequences and have to use -K. 
      In one test I had to use -K5 (the GPU has about 4 GB of memory).  When using -K3 -b0.2 there
      is a nearly 40% performance boost.  I am unaware (at this time) of how to detect the maximum 
      amount of memory available to a kernel so there is no way to auto-set -K and -b to the
      optimial values at runtime.  It will require trial and error for the end user.

      The impact of changing from the default value has not been determined for CPU workers.

2.3.3 - July 7, 2022
   framework:
      Updated primesieve to 7.9.  This addresses an issue in primesieve that can crash
      the mtsieve applications.
      
      Changed usage of primesieve (thanks to hints from Kim Wallisch, its creator) to
      improve sieving peformance.  As a result the worker threads now use an array instead
      of a vector.
      
      Broke the main sieving loop into two loops, one for when limited to a single thread
      or for sieving prior to switching to the GPU.  The second loop handles multiple worker
      threads much better.
      
      Reduced the initial CPU worksize from 1e6 to 96e3.  When the largest prime tested
      reaches 1e6, the worker thread can automatically adjust the worksize in an effort
      to have each chunk take between 1 and 5 seconds to process.  This has two affects.
      First, it will allow the CPU workers to stop more quickly when the user hits ^C.
      Second, it will do a better job at keeping CPU workers busy when using mulitple workers.
 
   fkbnsieve: version 1.5
      Only verify the first few factors found for each prime to speed up initial sieve.

   sgsieve: version 1.3
      Changed to support p/2p+1 where p is of the form k*b^n-1.  This makes it compatible
      with newpgen.  As of 1.2 p was of the form k*b^n+1.

   srsieve2: vesion 1.6.3
      Default to not use Legendre tables for multiple sequences unless -l is used.

2.3.2 - June 3, 2022
   Switched to llvm-mingw compiler in Windows.  I have tried three other compilers, I cannot
   use gdb with the latest msys2 or mingw64 compilers.  gdb gives an error when I start the
   programs, but only if they are built with -g.  The latest cygwin compiler fails to
   link the code.  Others have seen the same error I have seen with other sofwtware and
   nobody has provided a solution.
   
   There were some spurious reports of bugs with the 2.3.0/2.3.1 release.  This bugs
   seem to have gone away now that I switched to llvm-mingw.
   
   framework:
      Updated primesieve to 7.8.
      
      All remaining SSE code has been removed due to a bug resetting the rounding mode.
      kbbsieve was the only program using it and is now faster without it.

      Created abstract Device and Kernel classes and extend OpenCL and Metal classes
      from those.  This means that none of the GpuWorker classes need to include 
      or need to rely on logic specific to OpenCL or Metal as the implementation is
      hidden from those workers.
   
   kbbsieve: version 1.1
      Switched to Montgomery logic.  This is about 20% faster than the SSE logic.

2.3.1 - May 10, 2022
   Removed the embedded ASM logic for NVIDIA from all GPU kernels.  The OpenCL compiler
   generates code that is about 10% faster.  This is likely due to hard-coded register
   usage in the ASM.
   
   srsieve2, srsieve2cl: version 1.6.2      
      Add support for -C to generic sequence sieving which should be faster when one
      has few sequences to test.
      
      Change -S to -K so that less guessing is needed when one requires multiple GPU
      kernels due to having a lot of sequences.  The program will now create groups of
      sequences that are approximately the same size for each call to the kernel.

2.3.0 - April 22, 2022
   framework:
      Updated to msys2 gcc version 11.2.0, which required changing string to std:string
      and vector to std::vector instead of using the std namespace.

      Due to a compiler bug, now compiling xyyxsieve and afsieve with -O2 instead of -O3.
      
      Refactored the OpenCL implementation so that Metal implementation can use the same
      interface as OpenCL.  As a result KernelArgument.cpp no longer exists.  The Kernel
      now manages all CPU and GPU memory used by the Kernel.
            
      Added Metal support (for Apple hardware) since Apple has deprecated OpenCL on their
      hardware.  As a result, all sieves that can use OpenCL on Apple hardware can now use
      Metal on that same hardware.
      
      Added ARM support for sieves that do not require x86 ASM functions.
      
      Updated factor validation of mfsieve, gcwsieve, etc. to no longer require x86 asm.
      
      Moved cltoh.pl to the main directory.  make will now build the xxx.gpu.h file used
      by the GpuWorkers as part of compiling the GpuWorker object files.
 
      afsieve, afsievecl: version 1.2:
         The GPU code uses the Montgomery logic for the mulmod.
         All factors, found by either CPU or GPU are now validated.

      psieve, psievecl: version 1.5
         Replaced x86 ASM FPU code with Montgomery logic for the mulmod as the x86 ASM FPU
         code was missing factors.  There is no speed difference between the two.  The x86
         AVX code worked and has not been changed.
         
      srsieve2, srsieve2cl: version 1.6.1
         Changed default value for -g from to to 16.  GPUs tend to like powers of 2 due
         to their architecture.
         
         Added -C for single sequence sieving with the GPU.  This reduces threading overhead
         which has a noticable impact on sieving speed.  Using -C5 can improve speed by 50%.
         
2.2.3 - January 03, 2022
   framework:
      Modified sieving and factor rate calculations to account for long-running sieving
      processes which lead to those rates fluctuating a lot.

      Added xmallocNew() which can return NULL if not enough memory can be allocated
      which allows callers to decide how to handle that condition.

      For OpenCL sievers, added a call to clFinish() immediately after the call to
      clEnqueueNDRangeKernel() so that the GPU time is more accurately calculated.

   fkbnsieve:  version 1.4
      Do not create output file if there are no terms to write to it.

      Exit sieving early if there are no remaining terms.
   
   smsieve, smsievecl: version 1.0
      Initial release for sieving of Smarandache numbers between Sm(100000) and Sm(9999999).
      Cannot be used at this time for sieving p < 10000, but that is okay since a project
      has a pre-sieved input file to use.  It is impractical at this time to sieve values
      larger than Sm(9999999) due to the time needed for a PRP test with pfgw.
      
   srsieve2, srsieve2cl:  version 1.6.0
      Changed the calculation of factor density so that the GPU workers can use a
      lot less GPU memory.
      
      Added -S to split sequences into groups when feeding the GPU.
      
      The following only apply to the abs(c) = 1 logic:
         Added sr2sieve logic which uses Legendre checks in support of sieving when there
         are multiple sequences.
         
         Modified -L to represent a directory to hold a back up of the Legendre tables.
         There will be one file per unique b/k/c.

         Modified -l to specify the amount of memory to allocate to Legendre tsbles. -l0
         disables use of Legendre tables.  

         Added -U to specify a multiplier to compute BASE_MULTIPLE.  The specified value
         is multiplied by 2 to compute BASE_MULTIPLE.
         
         Added -V to specify a multiplier to compute POWER_RESIDUE_LCM.  The specified value
         is multiplied by BASE_MULTIPLE to compute POWER_RESIDUE_LCM.

         Added -X to specify a multiplier to compute LIMIT_BASE.  The specified value
         is multiplied by POWER_RESIDUE_LCM to compute LIMIT_BASE.
         
         BASE_MULTIPLE: least exponent Q for which sieving in subsequence base b^Q will be allowed.
         
         POWER_RESIDUE_LCM: For a prime p that satisfies p=1 (mod r), an "r-th power residue test"
                            checks whether a subsequence of k*b^n+c can possibly contain any terms of
                            the form x^r (mod p). If there are none then that subsequence can be
                            omitted from the BSGS step.
                            
                            To conduct r-th power residue tests for each r in a set R of prime
                            powers, set POWER_RESIDUE_LCM to lcm(R).
         
         LIMIT_BASE:  Allow sieving in base b^Q for Q chosen from the divisors of LIMIT_BASE.

         When sieving with a single sequence the defaults are:
            bmMultiplier = 15, prlMultiplier = 24, lbMultiplier = 1
            BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
         
         When sieving with multiple sequences the defaults are:
            bmMultiplier = 1, prlMultiplier = 360, lbMultiplier = 1
            BASE_MULTIPLE = 2, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
            
         It is recommend to play with these multipliers to find the optimal values as the provided
         defaults are from sr1sieve and sr2sieve and some values can provide faster sieving.
         The recommendation is to sieve a range of 1e9 and see how long it takes (use srsieve2.log
         to see how long it actually takes).  Vary -U, -V, and-X then evaluate the results.  With a
         script and multiple cores, this could done relatively quickly for most inputs.
            
2.2.2 - October 22, 2021
   framework:
      Added __attribute__ to method declarations that accept variable arguments.

   srsieve2, srsieve2cl:  version 1.5.3
      Modified to not remove terms that are prime as that defeats the purpose of
      Sierpinski/Riesel searches.
      Fixed bug where maxn for a sequence has a small factor, but it is not found.

   gnfdsieve, gfndsievecl:  version 2.1
      Fixed bug where code can find invalid factors. 
      
2.2.1 - February 16, 2021
   framework:  no changes
   
   gfndsieve, gfndsievecl:  version 2.0
      Moved GFN divisor testing to GFNDivisorTester class so that GFNDivisorApp is smaller
      and so that future support of non-x86 is easier since GFNDivisorTester
      calls a number of x86 asm methods directly.
      
      For gfndsievecl do not report any terms with factors < 50.  This reduces the size
      needed for the buffer that is used to report factors.
      
      Added -r and -R options to support functionality similar to ppsieve.
         -r will not generate a bitmap for tracking terms.  It will only generate an
         output file of factors.
         -R is used with -r.  If a term has a factor below 32767 (the default value),
         then the program will not output any factors for the term.
         -r and -x are mutually exclusive with -r overriding -x.

      Added various speed improvements.

   srsieve2, srsieve2cl:  version 1.5.2
      Fixed issue with CisOne logic as it tries to rebuild sequences when there are 
      multiple sequences as that is not yet supported.   

2.2.0 - February 9, 2021
   framework:
      Updated OpenCL on Windows.  See makefile for details.
      Updated primesieve to 7.6.
   
   psieve, psievecl:  version 1.4
      Some refactoring to support OpenCL worker.
      First release of psievecl.
      Verify factors from -I input file

   srsieve2, srsieve2cl:  version 1.5.1
      Fixed bug that was introduced in the refactoring of 1.5 that impacts generic sieving
      while using multiple threads.
      Added -R to remove sequences.  Use -Rk*b^n+c format to remove a single sequence or
      use -R with a file that has multiple sequences.  This is not tested yet.
   
2.1.6 - February 3, 2021
   framework:
      Add largestPrimeTested parameter to NotifyAppToRebuild() as the app cannot rely
      on accurately determining that value.
   
   srsieve2, srsieve2cl:  version 1.5
      Fixed remaining known issues with CisOne logic (sequences where abs(c) = 1) for
      a single CisOne sequence (sr1sieve).
      Added OpenCL code for CisOne logic.
      Added Legendre table lookups for CisOne logic.

2.1.5 - January 26, 2021
   framework:
      Added MpArith.h (non-vectorized) and changed class names in MpArithVector.h.
      Overloaded HashTable constructor as needed for srsieve2.
   
   srsieve2, srsieve2cl:  version 1.4
      Lots of refactoring to support special sieving logic for c=1 sequences.
      Implemented sr1sieve logic using Montgomery mulmod logic (CPU only).
      Change array of sequences to a linked list to avoid compiler warnings.
      Add support for pmin= line in input file (as generated by srsieve/srfile).

2.1.4 - January 9, 2021
   framework:
      Fixed an issue with creating GPU kernels on OS X.

   srseive2cl:  new release
      Finally an OpenCL version of srsieve2.  srsieve2cl is at least 3x faster than srsieve2,
      On my GPU it is limited to about 5000 sequences due to GPU memory limitations.  I do not
      know what the limits are for other GPUs.  It will switch to the GPU at p>1e6.

2.1.3 - January 6, 2021
   framework:
      Improve determination of "largest prime" tested for per minute stats by ignoring
      workers that haven't done any work.
      
   dmdsieve:  version 1.3
      Added working factor validation logic with -I.
      Modify factor validation logic to only verify the first 5 factors for each small
      prime.  If there is a problem it will reveal itself immediately.  This improves 
      the speed when starting a new sieve.
      
   gnfdsieve, gfndsievecl:  version 1.9
      Added working factor validation logic with -I.
      Modify factor validation logic to only verify the first 5 factors for each small
      prime.  If there is a problem it will reveal itself immediately.  This improves 
      the speed when starting a new sieve.
      Allow the GPU to start sieving at (kMax-kMin)/2 instead of kMax.
      Switched to Montgomery mulitplication in the GPU.
      Changed GPU code to handle "too many factors" similar to other GPU sievers in the
      framework as opposed to crashing if not enough GPU memory.
      GPU code is about 9x faster than CPU code, but GPU code is only of value if one
      needs to sieve more deeply than (kMax-kMin)/2, which is somwhere over n=1000.
      
   mfsieve:  version 1.9
      Used vectorized Montgomery logic to get a 1% to 5% speed boost.
      
   srsieve2:  version 1.3.1
      Modify factor validation logic to only verify the first 5 factors for each small
      prime.  If there is a problem it will reveal itself immediately.  This improves 
      the speed when starting a new sieve.
      

2.1.2 - December 30, 2020
   framework:
      Retain factor counts when workers are rebuilt.

   srsieve2:  version 1.3
      Added factor validation logic when using -I.
      Switched to Montgomery mulitplication, which is about 20% faster than the previous
      logic.  This also gives a 20% speed bump and adds support for p up to 2^62.

2.1.1 - December 29, 2020
   framework:
      Fixed an infinite loop that occurs with the special CPU worker in GPU builds.
      
      Removed the factor validation logic added in 2.1.0 do to an issue with how I
      implemented it.
   
   gcwsieve, gcwsievecl:  version 1.5
      Added working factor validation logic with -I.
      Improved GPU speed by another 10%.  Thanks to Yves Gallot for the code.
   
   mfsieve, mfsievecl:  version 1.8
      Added working factor validation logic with -I.
      Fix bug triggering invalid factor in the CPU code when minN is odd with factorials.
   
2.1.0 - December 27, 2020
   framework:
      On Windows, switched to using gcc from msys2 instead of gcc from mingw64.  -O3
      gives a nice performance bump to some of the non-ASM code.
      
      Exit after showing help when using -h swtich instead of giving fatal error.
      
      Add logic to support validation of factors passed with -I, but not all sieves are
      coded to do this validation.
      
      On GPU builds, default -W to 0 and -G to 1.  A "special" CPU worker will be created
      as necessary to test ranges of p that are to small for the GPU code.
   
      For GPU executables, add -H to show some GPU details when each kernel is created.
   
      Improve factor rate calculation when using GPU workers.
      
      Created FUTURE.txt for items I would like to get working.
   
   cksieve:  version 1.3
      Added logic to validate factors passed with -I.

   gcwsieve, gcwsievecl:  version 1.4
      Added logic to validate factors passed with -I.
      Add -f option to gcwsieve so that it can generate ABC output that is supported by LLR.
      Added -M option and fixed -S for gcwsievecl.
      Improved speed of GPU code of gcwsievecl by about 25%.
      The GPU code is more than 20x faster than the CPU for the same amount of work.
   
   mfsieve, mfsievecl:  version 1.7
      Added logic to validate factors passed with -I.
      Replaced all ASM routines with non-ASM routines based upon Montgomery mulitplication.  This
      provides a speed bump of at least 30% over the previous release.
      Fixed the GPU code and replaced magic number logic with Montgomery multiplcation as it is
      faster.  The new GPU code is more then 25% faster than the old GPU code.
      The GPU code is more than 20x faster than the CPU for the same amount of work.
   
    sgsieve:  version 1.2
      Added logic to validate factors passed with -I.
      
    twinsieve:  version 1.3
      Added logic to validate factors passed with -I.
      
    xyyxsieve, xyyxsievecl:  version 1.8
      Added logic to validate factors passed with -I.
    
2.0.6 - September 8, 2020
   Fixed crash with xyyxsievecl with overly large ranges for the GPU.
   More speed enhancements for xyyxsieve, mainly due to reduced memory requirements.
   Do not show ETA or percent done if less than 1% done and -P not used.
   Modify factor rate calculation to handle uneven factor distribution better.
   Fix srsieve2 as it was counting factors of terms that had already been removed.
   
2.0.5 - July 13, 2020
   Added support for newpgen format to sgsieve.

2.0.4 - July 10, 2020
   Fixed an issue with factor rate calculation if all factors are found in the 
   first minute and none after.
   Added sgsieve for Sophie-Germain searches.

2.0.3 - June 13, 2020
   Fixed an issue where sieving would stop after rebuilding workers.

2.0.2 - June 8, 2020
   Ensure that -p always overrides "last prime" from input file.
   Fixed fkbnsieve as it can sometimes miss factors for small p.
   Fixed issues with k1b2sieve.
   Fixed ^C handling as sometimes sieving wouldn't stop.
   
2.0.1 - June 1, 2020
   Added k1b2sieve to sieve 2^n+c for a range of n and c.
   Fixed runtime issue with gcwsieve.
   
2.0.0 - May 20, 2020
   Fixed an issue with the framework where it could stop sieving before reaching the max
   prime specified on the command line.
   Fixed a link issue with gfndsieve and dmdsieve with GMP on OS X.
   Fixed a compiler issue with the .align ASM instruction on OS X.

1.9.8 - May 18, 2020
   Fixed issue in gfndsieve where it accesses memory beyond the bounds of a vector.
   
1.9.7 - January 15, 2020
   Fixed issue in srsieve2 where it crashes when starting a new sieve.
   Fixed issue in srsieve2 where it finds an invalid factor and terminates.
   
1.9.6 - January 10, 2020
   Refactored logic for "single worker" and "rebuild" logic so they work more nicely together.
   Fixed calculation of p/sec which is wrong after workers are recreated.
   Reset factor stats whenever workers are created as the rate is inflated for lower p as
   far more terms are removed.
   Fix issue with multiple threads which can cause program to hang.
   Add support for ABC format to srsieve2.
   
1.9.5 - August 19, 2019
   Fixed errors factor rate calculation.

1.9.4 - August 9, 2019
   Fix a segfault with srsieve2 when rebuilding subsequences.
   Fix a segfault with AVX logic on non-Windows OSes.
   Fixed an issue with gfndsieve and dmdsieve where a large range of k and small range of x
   cause it to not sieve subsequent subranges.
   Added a table to gfndsieve to pre-set pmax if using -x and -P is not specified.
   Add -D flag to xyyxsieve to disable AVX as it crashes in the AVX routines on Windows OS
   on AMD CPUs.  The flag will be removed when that issue is resolved.
   Added -fP to srsieve2.  This will generate a more verbose ABC output file with each line
   specificing the values of k, n, and c.  It will also add the number_primes switch which is
   supported by pfgw to ensure only a single prime for each distinct k of the file.
   Added -fB to srsieve2.  This will generate a format that can be used to load BOINC.  This
   is equivalent to the "PRP" format used by srsieve/srfile
   
1.9.3 - July 25, 2019
   Re-implemented a single makefile.  It will now build separate exes for OpenCL enabled binaries.
   Fix ABC format line in xyyxsieve output file.
   Adjust reported factoring rate to account for less than 1 full CPU or more than 1 full CPU.
   Handle race condition that can trigger an infinite loop.
   Allow the first chunk to be smaller so that a rebuild of terms can be triggered earlier.
   srsieve2 will compile, but requires more testing and only has srsieve functions.
   
1.9.2 - June 17, 2019
   Split makefile so only GPU-enabled code will compile and link with OpenCL libraries
   with intention of a single makefile in the future as having two nearly identical
   makefiles is difficult to maintain.
   Made changes to the framework to support sieving in chunks as needed by gfndsieve
   and dmdsieve. 
   Added -x and -X switches to gfndsieve.  With these switches one can execute Fermat
   divisibility checks after sieving.
   Allow gfndsieve to sieve smaller n, but report if k*2^n+1 is a prime in the range
   being sieved.  It will not report if k*2^n+1 is prime when k*2^n+1 > maxp, but less than maxp^2.
   Added -x and -X switches to dmdsieve.  With these switches one can execute MMP
   divisibility checks after sieving.

1.9.1 - May 31, 2019
   Added srsieve2, which will eventually replace srsieve, sr2sieve, and sr1sieve.  It has
   internal logic to determine how to sieve the input sequences, thus is combined into a
   single executable.  It does not support Legendre table logic from sr2sieve and sr1sieve,
   but should be faster than both when not using Legendre tables.
   If ib_BlockWhenProcessingFirstChunk is set to true for the sieve, the factor rate will now
   be reported if the first chunk requires more than 60 seconds to process.
   Sieves now report intermediate results when processing chunks.  This allows for more
   accurate factor rate calculations when it takes a long time to process each chunk.
   Continue to report status if sieving is done, but workers are still processing their chunk.

1.9.0 - May 21, 2019
   Changes were made to the framework to support srsieve2 (which isn't ready yet).  No other
   sieves use thse features (yet).
   Modify FactorApp in an effort to more accurately compute the factor rate.  When starting
   a new sieve, the factor rate would be inflated because it would calculate the rate based 
   upon all factors removed.  This change will exclude factors removed in the the first 30
   minutes that the program has run.  This logic will be executed when fewer than 60 terms
   have been removed in the previous hour.
   Fixed an issue in fbncsieve and twinsieve where the output file would contain composites
   that are divisible by p that had already been sieved.
   
1.8.5 - January 17, 2019
   Upgrade from primesieve 6.2 to primesieve 7.3.
   Change calculation of factor rate to be based upon CPU utilization instead of number of workers.
   Fix to twinsieve to correctly remove terms when using -r when restarting a sieve.

1.8.4 - December 23, 2018
   Added the new Mersenne Prime to dmdsieve.
   Fixed an issue in twinsieve and fbncsieve where it can exit with an error when
   testing primes above sqrt(max term).
   Fixed an issue (impacting all sieves) in computing the factor rate.  The issue will manifest itself by
   outputting a large factor removal rate.  It only occurs after running the sieve continuously for 5 days.
   
1.8.3 - October 7, 2018
   Added dmdsieve.  This is used to sieve for divisors of Double-Mersenne numbers.
   Output terms from dmdsieve have form 2*k*M+1 where M is a Mersenne Prime.

   Fixed twinsieve so that it properly handles input factor files.
   
1.8.2 - October 2, 2018
   Fixed twinsieve to properly check for file format when using -s.
   
1.8.1 - September 27, 2018
   Fixed a bug that causes all sieves to crash when trying to compute factor rate.
   twinsieve has duplicate -i switch, changed to -s.
   
1.8.0 - September 25, 2018
   Added twinsieve.  This is more than 3x faster than newpgen's twin sieve.

   Modified OpenCL code to change calculation for default workunits to improve GPU throughput.
   Modified "start sieving" message to include expected factors, but only if -P is not the default value.
   Modified all sieves to have custom "start sieving message" so it each show more detail specific to that sieve.
   
1.7.5 - August 14, 2018
   Fixed a crash when reading multiple empty lines in a row from an input file.
   Added -r option to fbncsieve to remove terms where k % base = 0.
   Various updates for newpgen output from fbncsieve:
       use the .npg extension instead of .pfgw extension
       change third parameter of first line to 1 for srsieve/srfile compatibility
       change last parameter of first line to 1/2 since 1/2 is used for fixed newpgen
       sieves and 257/258 are used for fixed k sieves.

   
1.7.4 - August 10, 2018
   Modify pixsieve to report primes.
   Modify pixsieve to output search string to console and log when sieve starts.
   Fixed a crash in xyyxsieve when sieving only one sign.
   Generate default filename for mfsieve if not specified on the command line.
   Fix issue in psieve if it finds a factor for the last term of the input.
   
   mfsieve supports AVX and is about 40% faster than previously.
   
1.7.3 - August 3, 2018
   Fixed a memory exception that affects GPU workers for all sieves.
   Re-enable AVX support with psieve as accidentally disabled in 1.7.1.
   Do not output factor rate if no factors found.
   Fixed another issue in non-AVX psieve code that causes it to crash.

1.7.2 - July 27, 2018
   Fixed issue with reading ABCD files as input lines with 1 character would be ignored.
   Fixed a crash upon exit of fbncsieve.
   Allow override with -p when starting fbncsieve and fkbnsieve from an input file.
   
1.7.1 - July 25, 2018
   Fixed output for number of terms remaaining to support values > 2^32.
   Modified psieve to support an input file created by fpsieve.
   Fixed a bug in the non-AVX primorial ASM code that causes it to crash.
   
1.7.0 - July 4, 2018
   Added a timestamp to lines written to the log.
   Changed usage of some registers in AVX code to avoid ymm0-ymm3 being passed
   between calls to AVX routines.
   Added psieve for primorials.
   
   psieve supports AVX and is about 30% faster than fpsieve.
   
1.6.0 - June 25, 2018
   Fixed an error with factor rate calculation when less than 1 per second.
   Fixed an issue with gfndsieve when continuing a sieve and k < n.
   For kbbsieve, added some checks for algebraic factorizations.
   Added gcwsieve for Cullens and Woodalls.  This sieve is GPU enabled.   
   Renamed all ASM routines to easily distinguish FPU/SSE/AVX.

   Added AVX asm code for use by the Worker classes.
   Added a mini-chunk mode that can be used when the worker classes
   handles primes in chunks, such as AVX mode, which is chunks of 16 primes.

   gcwsieve supports AVX.  The CPU-only code is about 30% faster than Geoff Reynold's version.
   xyyxsieve supports AVX.  The CPU-only code is about 2.5x faster than the previous version.
   
1.5.0 - April 10, 2018
   kbbsieve is more fully tested.  It now uses a powmod that is limited to
   52 bits (switching from extended floating point to SSE), which should be
   at least 10% faster.

1.4.0 - April 9, 2018
   Some common functionality for GPU sieving has been moved to Worker.cpp.
   All GPU workers validate factors found by the GPU.

   The xyyxsieve GPU sieving issue has been resolved.
   The pixsieve GPU sieving code has been tested.
   GPU sieving has been added to mfsieve.  It has been tested.
   GPU sieving has been added to gfndsieve.  It has been tested.
   
   Add kbbsieve, for the form k*b^b+/-1 for fixed k and variable b.  It has been
   partially tested.
   
1.3.0 - March 11, 2018
   Ensure that "ENABLE_GPU=no" in makefile builds all programs without error.
   cksieve no longer gives a fatal error if the computed root is not an actual
   root.  This condition rarely happens, but is okay when it does.
   Overriding -p from the command line should now work when starting with 
   an input file.
   
   Added GPU workers to xyyxsieve.  When using GPU workers, an overflow with
   collecting factors can cause xyyxsieve to crash.  If that happens override
   -S and/or -g or sieve more deeply with the CPU before adding GPU workers.
   This will be addressed in a future release.
   
   Added GPU workers to pixsieve.  It has not been tested yet.

1.2.0 - February 23, 2018
   fkbnsieve is now working.
   Modify cksieve to detect candidates that are prime and to log them.
   Fixed an asm bug that at worst causes factors to be missed by
   fbncsieve and gfndsieve.  It will nor result in invalid factors and
   if it did, they would be caught at runtime due to built-in factor
   checking that relies on completely different code.
   
   Added -A option to apply factors (or reformat candidate file) and
   exit immdiately without sieving.
   Added GPU classes.  This adds the following command line options:
      -D - to select the GPU platform
      -d - to select the GPU device
      -G - to specify the number of GPU workers
      -g - to set multiple of workgroupsize which is used to compute
           the number of primes per GPU worker
   Added GPU workers to afsieve.
   
1.1.0 - February 21, 2018
   Add an internal flag that guarantee that suspends all but one Worker when
   processing the first chunk of primes.  This is used to improve performance
   when there is a high factor density for low primes.  This will also suppress
   any on screen reporting or checkpointing until that chunk is processed.
   
   Fix issue in computing CPU utilization.
   Changed -c (chunksize) option to -w (worksize).
   Change output to use shorter notation for min and max primes.

   cksieve   - Fixed.
   gfndsieve - Enable the flag mentioned above.
   fbncsieve - Enable the flag mentioned above.
   fkbnsieve - Added, but not tested.
   
1.0.0 - February 10, 2018
   Initial Release