--mfaktc and --mfakto prototype #24

brubsby · 2024-07-17T21:22:03Z

Very much a work in progress, but creating this PR mostly as a jumping off point for discussion and feedback. Probably won't be ready for v2.2, and definitely shouldn't hold up the release of CERT support.

Current TODO:

Nice to have:

TF1G integration
nag user if they have StopAfterFactor=2 (suggest Stages=1 and StopAfterFactor=1)

Testing areas:

new setup default parameter experience
multiple bitlevel ini settings
other mfaktc .ini configuration combinations

Bugs:

"stall detection" if mfaktc not making progress since last checkin
work estimation for multiple bitlevel assignments (checkpoints only for single bitlevel)
don't send d=1 if exponent still in assignments
fix percentage calculation for Stages=1 and multiple bitlevel assignments

Server-side changes wishlist:

have work preference 12 respect max and min arguments to ga
have work preference 12 send highest ROI assignment in the 1m bins that each category is currently giving PRP out from.

Probably won't do:

automatically generate and run TF worktodo lines for PRP or P-1 assignments that are underfactored (very long term stretch goal)
create configuration options for TF work preferences that are passed to Manual_GPU_Assignment
delete worktodo lines that fail to get AID via selfassign (current behavior just puts N/A, deleting instead of poaching (even if just an option) might be nice)
GPU72 integration

brubsby · 2024-07-17T21:33:02Z

Ended up forking mfaktc to add json results that include the AID, will probably add default logging soon too, so we can parse the logs and use those for work estimation.

primenet.py

# Conflicts: # README.md # primenet.py

…fference for TF

…mfaktc checkpoint

brubsby · 2024-07-25T04:40:47Z

primenet.py

@@ -4795,14 +4795,18 @@ def get_assignments(adapter, adir, cpu_num, progress, tasks):
        time_left = timedelta(seconds=time_left)
        days_work = timedelta(days=options.days_of_work)
        if time_left <= days_work:
-            num_cache += 1
+            num_cache = math.ceil(num_existing * days_work / time_left)


This is perhaps the "exact" calculation of how big our cache should be, assuming the time_left estimation is correct, and all jobs are similarly sized. (and should work for every worktype)

The drawbacks of doing it this way are that if we start up a fresh instance, we won't have a time calculation, so we have to choose some amount of assignments for the first run, in the case that the user hasn't used num_cache. I've chosen 10 for TF and 1 for other types (1 was the previous behavior in this case, I believe)

The other drawback is that we might accidentally fetch the user a large amount of assignments, if they have an underestimate for time_left or something. Some guard-rails would maybe be prudent, but I can't think of an easy way to do it that won't be annoying when someone is trying to fetch 10 days of work before taking their computer immediately offline.

After all, the user could then --unreserve-all if we accidentally fetch too much.

10 assignments would hold over my GPU for roughly two hours, so anything above that for --timeout would result in mfaktc closing before more work received. Perhaps in the "fresh" case, we want to get a small amount of assignments, and wake every minute until we have a good time estimate, and then get more work?

Not sure what the most general solution to this is, I don't necessarily want to have a bunch of carveouts for TF if we don't have to.

If the --num-cache option is larger than what is needed for the --days-work option, this will select the smaller of the two values. I think you want something like this instead (which also eliminates the math.ceil):

Suggested change

num_cache = math.ceil(num_existing * days_work / time_left)

num_cache = max(num_cache, -(days_work * num_existing // -time_left))

Good catch, I've redone the debug lines relating to this to make a bit more sense in the case where num_cache is bigger than days_of_work calculates is necessary.

brubsby · 2024-07-25T15:50:37Z

--max-exp (and presumably --min-exp) aren't working with work preference 12.

connectionpool.py: [MainThread 2024-07-25 11:45:42,315]  DEBUG: http://v5.mersenne.org:80 "GET /v5server/?px=GIMPS&v=0.95&t=ga&g=[guid]=0&a=&max=138000000&ss=19191919&sh=ABCDABCDABCDABCDABCDABCDABCDABCD HTTP/11" 200 179
primenet.py: [MainThread 2024-07-25 11:45:42,316]  INFO: PrimeNet success code with additional info:
primenet.py: [MainThread 2024-07-25 11:45:42,316]  INFO: Server assigned trial factoring work.
primenet.py: [MainThread, Worker #1 2024-07-25 11:45:42,316]  INFO: Got assignment D5BBDEFA9AA42BBBC44F3AD5CFA8DA31: Trial factor M146204119
primenet.py: [MainThread, Worker #1 2024-07-25 11:45:42,316]  DEBUG: Fetching using v5 API
primenet.py: [MainThread, Worker #1 2024-07-25 11:45:42,316]  INFO: Getting assignment from server

We can maybe ask George or James to add support for these, as it looks correct from our side, but either way not extremely pressing. Just wanted to test them out since you mentioned them.

tdulcet · 2024-07-27T15:53:35Z

primenet.py

@@ -4795,14 +4795,18 @@ def get_assignments(adapter, adir, cpu_num, progress, tasks):
        time_left = timedelta(seconds=time_left)
        days_work = timedelta(days=options.days_of_work)
        if time_left <= days_work:
-            num_cache += 1
+            num_cache = math.ceil(num_existing * days_work / time_left)


If the --num-cache option is larger than what is needed for the --days-work option, this will select the smaller of the two values. I think you want something like this instead (which also eliminates the math.ceil):

Suggested change

num_cache = math.ceil(num_existing * days_work / time_left)

num_cache = max(num_cache, -(days_work * num_existing // -time_left))

tdulcet · 2024-07-27T15:57:53Z

primenet.py

+    if config.has_option(SEC.PrimeNet, "MaxExponents"):
+        amax = config.getint(SEC.PrimeNet, "MaxExponents")
+        if amax < num_cache:
+            adapter.debug(
+                "Config option MaxExponents ({0:n}) is smaller than num_cache ({1:n}), num_cache decreased to {0:n}".format(
+                    amax, num_cache, workfile))
+            num_cache = amax


If only to prevent a bug in the program from causing it to request too many assignments, I would prefer to keep a default value for the MaxExponents config file option, even if it is significantly larger when the user is doing TF assignments.

On one level I agree, as it will surely happen that someone gets way too many assignments due to a bug or misunderstanding if we don't include some safeguard, but I just personally found the experience of having to discover the default MaxExponents option in the code to be a bad user experience.

There are several factors that contribute to the confusion in the previous version:

there is no cli option for max exponents, so -h doesn't lead you in the right direction

it's not mentioned in --setup

there's a cli option for "max exponent" but not max exponents (two different options that are confusingly similar)

in the previous configuration, even if you have no MaxExponents in the config (which happens by default), you're still limited to 15, so you have to search through the actual code to figure out what's going on. (even if we raise this, it will still happen when a user puts in a high days_of_work for certain worktypes)

throughout this whole process, you're not actually told about MaxExponents in the log

I have added some debug messages to hopefully make it clearer if this is happening to someone in the future (5), but they would need to have debugging on, which, I think if they're having to use debugging to figure out why the program isn't behaving like they told it to (using days work), we've already failed some test of the user experience.

10 days of TF work for my machine can be ~1500 assignments at small (but still ftc wavefront) bitlevels, which is an enormous amount of exponents, and the maximum number of days of work to fetch is 90, which would be ~9470 more averagely sized assignments for me.

I think I heard 10,000 was the maximum number of assignments you could have in primenet, though if I grabbed 10,000 wavefront TF assignments that would likely greatly negatively impact the project. We surely want the limit to be below this, but still respect days_of_work

TL;DR: I think we should have safeguards but they should be upfront.

Proposal:

We could add it to --setup with an appropriately sized default for the worktype.

(People become aware of its existence and set it according to their comfort level)

When creating the .ini file, we put that default in

do this even if the user doesn't use --setup

when they're editing the .ini they will see it and realize that's what's keeping the script from getting more assignments.

We can use the previously mentioned default if they delete the line in .ini, but display a warning that doesn't require the debug verbosity to see, that we are using max_exponents=15 or 100 (or something) as a default.

(Alternatively, just make all the num_cache related debugs into info messages)

Thanks for the detailed write up.

The MaxExponents config file option comes from Prime95/MPrime, which also defaults to 15 assignments. Note that this limit is per worker, so with multiple workers one can have many more assignments. I was hesitant at first to add support for this option, but over time I became concerned that a bug either in the script or on the server could cause the program to request too many assignments, so I decided to add this as a safeguard. Prime95/MPrime of course goes a step further and will actually automatically unreserve assignments, which I would never do.

Anyway, at least with CPUs, 15 wavefront assignments is typically a lot of work (days or weeks), especially with primality tests, so most users should never hit this limit. However, GPUs can obviously be faster, so feel free to increase the limit for TF work to whatever value makes sense, even if that is hundreds or thousands of assignments. Ideally, only the most advanced users should ever need to change this option. I believe TF (and P-1) assignments are only valid for 30 days, so requesting 90 days of work would not be applicable here.

Another problem I am aware of is that there is no exhaustive list of all supported config file options (without reading my release notes for each version). It would be easy enough to create such a list, but I am not sure the best place to store it. Maybe on the README? Long term, we probably need to write more extensive documentation on how to use the program.

Regarding your proposal, adding a log message when the MaxExponents value is hit was a good improvement, so thanks for implementing that. Note that in general I do not want to script to be too verbose by default, as users may miss important messages. I wish the Python logging module supported multiple levels of debug messages instead of the critical level, which is currently unused, but alas all we have is debug. Feel free to put a default value for MaxExponents in the local.ini config file. In that case, you might also restore the results_file as well. I would prefer to not add a prompt to the --setup option, as users would then just set a large value defeating the purpose of this feature, but I am open to being convinced otherwise.

The MaxExponents config file option comes from Prime95/MPrime, which also defaults to 15 assignments.

Ah, didn't realize this was directly from p95, that makes a bit more sense then.

I believe TF (and P-1) assignments are only valid for 30 days, so requesting 90 days of work would not be applicable here.

This is a good point, I think non Cat 0 - 1 exponents are set to 60 days by default, mine mostly shows 60 for all of my assignment types here: https://www.mersenne.org/manual_extension/
Though I manually extended some P-1 jobs that I'm doing jointly and they now say 218 days.

Either way, 1000 TF assignments is certainly enough for most people, if you have more than 100 assignments you get put on a watchlist anyway. (https://www.mersenne.org/assignments/topusers.php)

Though I think this list actually uses 1000 for the TF threshold, otherwise I would be on it.

Feel free to put a default value for MaxExponents in the local.ini config file.

Can do!

In that case, you might also restore the results_file as well.

I'm not sure what you mean by restoring the results_file, though it may come to me while I'm in the code.

I would prefer to not add a prompt to the --setup option, as users would then just set a large value defeating the purpose of this feature, but I am open to being convinced otherwise.

That's fair. I suppose gating getting an obscene amount of assignments behind config editing is pretty good. I was envisioning it as a "launch control lock switch" sort of deal, where the user is allowed to do whatever they want with the program, but we make it difficult to do on accident.

Just spitballing, but maybe we have the MaxExponents default to some value based on their selected days of work and their worktype? This way we guard against wildly inaccurate/buggy time estimates causing lots of work being fetched, but we stay out of the way of users that actually want that much work? For TF this can maybe be like 150 assignments per day, I'm not sure what other worktypes should default to though.

Ah, didn't realize this was directly from p95, that makes a bit more sense then.

Yes, for just a general hint, the camel cased config file options are either the same or adapted from a similar option in Prime95/MPrime, while the snake cased ones are unique to the PrimeNet program.

I'm not sure what you mean by restoring the results_file, though it may come to me while I'm in the code.

When we removed the default value for the --resultsfile option, this also removed it from the config file by default. I was just proposing that you manually add it back in the config file when you add the default value for the MaxExponents option.

I was envisioning it as a "launch control lock switch" sort of deal, where the user is allowed to do whatever they want with the program, but we make it difficult to do on accident.

Yes, I agree, that was my original goal with this feature. I am all for empowering power users to do what they need to do.

Just spitballing, but maybe we have the MaxExponents default to some value based on their selected days of work and their worktype?

This would be fine with me, but it would of course be more difficult to implement, as we could then no longer have a hard coded default in the config file. I was previously going for simplicity.

All these should be done now!

brubsby · 2024-07-27T22:01:59Z

Starting another spot for discussion here, as it's not really mfaktx related per se, but primenet related, and I feel I've saturated my outgoing communications to folks for the time being :)

For my TODO post above:

fix percentage calculation for Stages=1 and multiple bitlevel assignments
This raises some question about what exactly we should report as STAGE and percentage in various cases.
Let's go through an example of an assignment that is TF from 2^75 to 2^77.

If mfaktc.ini has Stages=0, then my understanding is that it's running every single bitrange in the assignment at once, so for example, if we find a 76.5 "bit" factor, we aren't sure that there isn't also 75.5 "bit" factor to be found in the range as well. So from this, reporting a simple TF75 stage seems a bit misleading, as the percentage associated with this stage is actually the joint percentage of TF75-77.

Similarly if mfaktc.ini has Stages=1, and we're reporting progress on TF from 2^75 to 2^77, we will report progress stage TF75 (and a percentage) during 75-76, and then TF76 (and a percentage) during 76-77. What exactly these percentages should be is unclear to me. If we are reporting 0-100% for each stage, then TF75 (and a percentage) is indistinguishable from the Stages=0 case of TF75 (and a percentage).

Estimated completion date, I would think, should just be when we expect to release the exponent, and not necessarily when the current stage is done. But I'm not sure if percentage should be that as well.

TL;DR: Should percentage be our progress on the current stage, or the progress of the whole assignment?

# Conflicts: # primenet.py

brubsby · 2024-08-03T18:38:51Z

As the number of changes in this PR keeps growing, merging upstream changes starts to get a bit more painful (I think it might've been better if I started with a rebase, but my tooling kept telling me a merge was a better idea), would you be open to merging this into the 2.2 branch at some point soon, and doing more changes as another PR? Or would you rather keep a version of the code with CERT stuff separate from the TF stuff, in case we want to release them independently?

brubsby · 2024-08-03T18:56:28Z

Here's the code I used to regenerate the merged table, on the chance it saves you any amount of manual effort:

import tables
tables.array(
	(
		(4,    "✔",  "✔",   "",   "",   "",  "✔"),
		(12,    "",   "",   "",   "",  "✔",   ""),
		(100,  "✔", "✔*",  "✔",  "✔",   "",   ""),
		(101,  "✔",   "",   "",  "✔",   "",   ""),
		(102,  "✔", "✔*",  "✔",  "✔",   "",   ""),
		(104,  "✔", "✔*",  "✔",  "✔",   "",   ""),
		(106,   "", "✔*",  "✔",  "",   "",   ""),
		(150,  "✔",  "✔",  "✔",   "",   "",   ""),
		(151,  "✔",  "✔",  "✔",   "",   "",   ""),
		(152,  "✔",  "✔",  "✔",   "",   "",   ""),
		(153,  "✔",  "✔",  "✔",   "",   "",   ""),
		(154,  "✔", "✔*",   "",   "",   "",   ""),
		(155,   "",  "✔",  "✔",   "",   "",   ""),
		(156,   "",   "",   "",   "",   "",   ""),
		(160,  "✔",   "",   "",   "",   "",   ""),
		(161,  "✔",   "",   "",   "",   "",   ""),
	),
	("Worktype", "Mlucas", "GpuOwl", "PRPLL", "CUDALucas", "mfaktc/mfakto", "CUDAPm1"),
	headerrow=True,
)

(I really wish the check glyph used had a monospaced glyph in most of the fonts used in editors and stuff, lol)

brubsby · 2024-08-03T22:19:25Z

primenet.py

@@ -5686,6 +5686,10 @@ def ping_server(ping_type=1):
    return None


+def is_pyinstaller():
+    return getattr(sys, 'frozen', False) and hasattr(sys, '_MEIPASS')


This seems to be the canonical way to determine whether we're in a pyinstaller instance, so I've added it.

https://pyinstaller.org/en/stable/runtime-information.html

For future reference, I added this link to a comment in the function.

…n --setup

tdulcet · 2024-08-04T14:17:56Z

would you be open to merging this into the 2.2 branch at some point soon, and doing more changes as another PR?

Sorry about the hassle. Yes, we can merge this soon and then make any additional improvements in another PR. If we are going to support mfakto, it might be good to get that working with the script. For now, I will wait to push any more changes to the 2.2 branch until this is merged.

I really wish the check glyph used had a monospaced glyph in most of the fonts used in editors and stuff, lol

Yeah, I liked the checkmark symbol, but I agree that it is annoying that this table does not align in editors, so feel free to use different symbol. You could also change the appearance of the table by adjusting some of the options for my library.

…ting for multilevel assignments

# Conflicts: # primenet.py

tdulcet and others added 7 commits July 9, 2024 08:46

Updated GpuOwl notebook to install the old Nvidia drivers.

a692a7e

First pass at adding MfaktX to primenet.py

fbd00d1

Add mfaktx result.txt to json translation, and tf ar api mapping rules

56c1059

Fix factor reporting, other bugfixes

39e2f82

Merge remote-tracking branch 'upstream/master'

be72bae

Rough draft of PrimeNet program version 2.2. [skip ci]

40c05ae

Merge remote-tracking branch 'upstream/primenet-2.2'

75f53e8

tdulcet added the enhancement New feature or request label Jul 18, 2024

brubsby commented Jul 18, 2024

View reviewed changes

primenet.py Show resolved Hide resolved

brubsby and others added 5 commits July 18, 2024 13:25

undo pfactor parsing removal typo

9889f40

Added code to parse checkpoint files for mfaktc/mfakto.

582ae64

Add mfaktc work estimation

736810d

Add support for new GPU TF work pref

21a6175

Updated PrimeNet program to version 2.2.

e63c491

tdulcet force-pushed the primenet-2.2 branch from 40c05ae to e63c491 Compare July 23, 2024 10:35

brubsby added 7 commits July 23, 2024 10:22

Merge remote-tracking branch 'upstream/primenet-2.2'

0286e63

# Conflicts: # README.md # primenet.py

Fix Merge, change results file default for mfaktc

ebbf666

Fix work estimation by removing msec_per_iter scaling for exponent di…

d8af14e

…fference for TF

Update TF work estimation to use bit_level_time from the 0.23-beta.4 …

4a66163

…mfaktc checkpoint

Update defaults and --setup behavior for TF

d0e105d

First try on fixing quantity of assignments fetched for TF

ef15996

Get 10 TF assignments on first run (num_cache == 0)

5e340d6

brubsby commented Jul 25, 2024

View reviewed changes

More debug messages about num_cache.

eba88f9

brubsby added 2 commits July 25, 2024 14:13

Fix grammatical agreement for corner cases in get_assignments

fa6a335

Don't send d=1 when more TF bitlevels are in worktodo.txt

4ed942c

tdulcet reviewed Jul 27, 2024

View reviewed changes

tdulcet force-pushed the primenet-2.2 branch from e63c491 to d51ed5c Compare August 3, 2024 14:21

Merge remote-tracking branch 'upstream/primenet-2.2'

db5c159

# Conflicts: # primenet.py

brubsby added 2 commits August 3, 2024 14:41

fix merge

11930f8

regenerate worktype table

fe02b06

brubsby added 2 commits August 3, 2024 17:33

add TF levels to exponent_to_str

50ef3e1

add --setup epilogue message

ea13cc7

brubsby commented Aug 3, 2024

View reviewed changes

brubsby added 4 commits August 3, 2024 18:46

change default days_of_work for TF to 1.0, permit fractional values i…

0f4dbf6

…n --setup

add type hints for config parsing for options without optparse defaults

fe7670d

add helpful tips to usage message

3a94aa4

fix setup epilogue help invocation

31f8b55

brubsby and others added 14 commits August 5, 2024 01:33

remove Wh from TF no factor result string. fix stage/percentage repor…

f91a1f8

…ting for multilevel assignments

usability testing feedback tweaks

7c44944

don't merge multi-line qualified success logging message

883de51

change next check log message to locale formatting

ea2d445

attempt a fancy python invocation detection

633da55

mfakto fail condition and timedeltastr

eb06151

Removed remaining uses of Wh from result output.

6e407a2

attempt to support mfakto

a73517c

Merge remote-tracking branch 'origin/master'

e83d237

# Conflicts: # primenet.py

Fix and add more option type hits.

6bbc15f

add "louder" stall warning, attempt to fix logging during network errors

397cd6b

remove dash from TF no factor message when no security code

c6ae513

fix factora message

5100940

Updated to no longer install the Requests library. [skip ci]

acae2af

tdulcet force-pushed the master branch from eb6fe38 to acae2af Compare August 22, 2024 10:38

tdulcet merged commit a1ef975 into tdulcet:primenet-2.2 Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--mfaktc and --mfakto prototype #24

--mfaktc and --mfakto prototype #24

brubsby commented Jul 17, 2024 •

edited

Loading

brubsby commented Jul 17, 2024 •

edited

Loading

brubsby Jul 25, 2024

brubsby Jul 25, 2024

tdulcet Jul 27, 2024

brubsby Jul 29, 2024

brubsby commented Jul 25, 2024 •

edited

Loading

tdulcet Jul 27, 2024

tdulcet Jul 27, 2024

brubsby Jul 28, 2024 •

edited

Loading

tdulcet Jul 29, 2024

brubsby Jul 29, 2024

brubsby Jul 29, 2024

tdulcet Jul 30, 2024

brubsby Aug 4, 2024

brubsby commented Jul 27, 2024

brubsby commented Aug 3, 2024

brubsby commented Aug 3, 2024 •

edited by tdulcet

Loading

brubsby Aug 3, 2024

tdulcet Aug 9, 2024

tdulcet commented Aug 4, 2024

	num_cache = math.ceil(num_existing * days_work / time_left)
	num_cache = max(num_cache, -(days_work * num_existing // -time_left))

--mfaktc and --mfakto prototype #24

--mfaktc and --mfakto prototype #24

Conversation

brubsby commented Jul 17, 2024 • edited Loading

brubsby commented Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brubsby commented Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brubsby Jul 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brubsby commented Jul 27, 2024

brubsby commented Aug 3, 2024

brubsby commented Aug 3, 2024 • edited by tdulcet Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdulcet commented Aug 4, 2024

brubsby commented Jul 17, 2024 •

edited

Loading

brubsby commented Jul 17, 2024 •

edited

Loading

brubsby commented Jul 25, 2024 •

edited

Loading

brubsby Jul 28, 2024 •

edited

Loading

brubsby commented Aug 3, 2024 •

edited by tdulcet

Loading