-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vacuum): add vacuum management for cleaning expired packages #3442
base: main
Are you sure you want to change the base?
Conversation
Thank you for your contribution! |
Thank you for your great work! According to the benchmark test, it looks like the overhead is really small. Command interfaceI don't think e.g. aqua vacuum show [--expired]
aqua vacuum run The command DB schema
{
"LastUsageTime": "2025 ...",
"PkgPath": [""]
} The key is not good because we can change package name freely. I think I think we can add metadata such as type, name, etc to the value. Code review
When we return an error or output error or warn log, definitely any operation fails.
CheckWe should confirm if the treat of locale has no problem. func (vc *Controller) isPackageExpired(pkg *PackageVacuumEntry) bool {
const secondsInADay = 24 * 60 * 60
threshold := int64(*vc.Param.VacuumDays) * secondsInADay
return time.Since(pkg.PackageEntry.LastUsageTime).Seconds() > float64(threshold)
} |
I’ll revisit my draft and get back to you, taking your feedback into account! |
Refactored code to exclusively use `StorePackage` as asynchronous functions. It is important to note that from now on, **you must ensure all tasks are finalized by calling the `Close` functions after invoking `StorePackage`.** Replaced the generated key with `pkgPath` to ensure uniqueness. Changed `pkgPath` type from an array of strings to a single string.
Rewrite is completed, take a look at this @suzuki-shunsuke. I think you'll prefer this implementation. Command interfaceCommands have been refund : aqua vacuum run
aqua vacuum list
aqua vacuum list --expired DB schemaPkgPath is know the key of the package in vacuum database. We can add/remove metadata of corresponding value if needed. type Package struct {
Type string // Type of package (e.g. "github_release")
Name string // Name of package (e.g. "cli/cli")
Version string // Version of package (e.g. "v1.0.0")
PkgPath string // Path to the install path without the rootDir/pkgs/ prefix
} Check
func (vc *Controller) isPackageExpired(pkg *PackageVacuumEntry) bool {
const secondsInADay = 24 * 60 * 60
threshold := int64(*vc.Param.VacuumDays) * secondsInADay
return time.Since(pkg.PackageEntry.LastUsageTime).Seconds() > float64(threshold)
} To : func (vc *Controller) isPackageExpired(pkg *PackageVacuumEntry) bool {
const secondsInADay = 24 * 60 * 60
threshold := vc.Param.VacuumDays * secondsInADay
lastUsageTime := pkg.PackageEntry.LastUsageTime
if lastUsageTime.Location() != time.UTC {
lastUsageTime = lastUsageTime.In(time.UTC)
}
timeSinceLastUsage := time.Since(lastUsageTime).Seconds()
return timeSinceLastUsage > float64(threshold)
} Added convertion to UTC trying to resolve location issue. Visualizing Managed PackagesCoverage
A high latence is observed for test du to tests of failling access to vacuum.db Performances :The rewrite of this implementation has a noticeably similar impact based on the hyperfine results. However, the benchmark highlights a significant improvement in the average operation duration and the number of allocations per operation.
|
Apologies for keeping you waiting. |
Don't worry, it's your tool, and I know you take care of it, so take the time you need to make sure everything's ok if you think it's necessary! It's an interesting feature in my opinion, but it's not going to revolutionize the use of this tool. It can still wait without any problems on my side. (It's just that if we could avoid having to redo all this work if there were ever any conflicts...) |
pkg/installpackage/installer.go
Outdated
if err != nil { | ||
return fmt.Errorf("get the package install path: %w", err) | ||
} | ||
|
||
if err := is.vacuum.StorePackage(logE, pkg, pkgPath); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstallPackage skips installing a package if the package has already been installed.
In that case, I think we should not update the timestamp.
Timestamp should be updated only when the package is actually downloaded and installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the principle, but as the install function is systematically called in the exec and install processes, I thought it would be simpler to position the timestamp update process here.
But yes, the risk would be to modify this behavior and break the operation without realizing it. It would be more coherent to carry the timestamp update action further upstream, during installation and execution.
I'm considering if we can simplify asynchronous process. |
pkg/cli/exec/command.go
Outdated
@@ -54,5 +56,6 @@ func (i *command) action(c *cli.Context) error { | |||
if err != nil { | |||
return fmt.Errorf("parse args: %w", err) | |||
} | |||
return ctrl.Exec(c.Context, i.r.LogE, param, exeName, args...) //nolint:wrapcheck | |||
defer ctrl.CloseVacuum(logE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I don't think this works here, because aqua exec can use: https://github.com/aquaproj/aqua/blob/main/pkg/controller/exec/exec.go#L126.
And this (on UNIX) uses the "Exec" function of syscall_unix, which, when executed, replaces the calling process with the called process.
This cuts short the rest of the code, and the functions called in defer are not played. Finally, asynchronous processes are not necessarily terminated either...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Thank you.
I've confirmed defer function isn't executed.
package main
import (
"log"
"os"
"golang.org/x/sys/unix"
)
func main() {
if err := core(); err != nil {
log.Fatal(err)
}
}
func core() error {
defer func() {
log.Println("defer")
if err := os.WriteFile("test.txt", []byte(`hello`), 0o644); err != nil {
log.Print(err)
}
}()
return exe()
}
func exe() error {
return unix.Exec("/usr/bin/curl", []string{"curl", "--version"}, os.Environ())
}
$ go run main.go
curl 8.7.1 (x86_64-apple-darwin24.0) libcurl/8.7.1 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.63.0
Release-Date: 2024-03-27
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL threadsafe UnixSockets
$ ls # test.txt isn't created.
go.mod go.sum main.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ AQUA_VACUUM_DAYS=5 hyperfine -N --warmup 3 '/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v' '/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v'
Benchmark 1: /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v
Time (mean ± σ): 50.5 ms ± 1.4 ms [User: 4.2 ms, System: 2.0 ms]
Range (min … max): 47.7 ms … 56.3 ms 60 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v
Time (mean ± σ): 40.4 ms ± 1.4 ms [User: 6.4 ms, System: 1.8 ms]
Range (min … max): 38.7 ms … 44.6 ms 73 runs
Summary
/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v ran
1.25 ± 0.05 times faster than /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v 1.25 is a bit slow. Without $ hyperfine -N --warmup 3 '/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v' '/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v'
Benchmark 1: /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v
Time (mean ± σ): 38.7 ms ± 1.2 ms [User: 5.1 ms, System: 1.8 ms]
Range (min … max): 36.9 ms … 42.1 ms 76 runs
Benchmark 2: /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v
Time (mean ± σ): 40.7 ms ± 1.3 ms [User: 6.1 ms, System: 1.8 ms]
Range (min … max): 39.2 ms … 45.7 ms 72 runs
Summary
/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v ran
1.05 ± 0.05 times faster than /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v $ git checkout abd0bf400a42f9a313324c0024fb6dec92b6ddcc
$ AQUA_VACUUM_DAYS=5 hyperfine -N --warmup 3 '/Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v' '/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v'
Benchmark 1: /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v
Time (mean ± σ): 56.3 ms ± 8.6 ms [User: 5.1 ms, System: 2.3 ms]
Range (min … max): 49.5 ms … 101.7 ms 55 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: /Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v
Time (mean ± σ): 44.3 ms ± 1.5 ms [User: 7.6 ms, System: 2.1 ms]
Range (min … max): 42.0 ms … 48.1 ms 66 runs
Summary
/Users/shunsukesuzuki/.local/share/aquaproj-aqua/internal/pkgs/github_release/github.com/aquaproj/aqua/v2.42.2/aqua_darwin_arm64.tar.gz/aqua exec -- cmdx -v ran
1.27 ± 0.20 times faster than /Users/shunsukesuzuki/go/bin/aqua exec -- cmdx -v |
I've done the performance test using some revision on my M3 Pro, but the performance is worse than the result you shared. #3442 (comment) Idea of new architectureMaybe we need to change the architecture. I came up with an idea.
This is just an idea. |
Probably |
Oh, this approach doesn't need database.
|
Check List
Require signed commits
, so all commits must be signedIntroduce Vacuum Feature for Expired Packages
Addresses #2942 and Discussion #3086.
This pull request introduces a new feature for vacuuming expired packages.
Workflow
Environment Variable Requirement
The feature activates only when the environment variable
AQUA_VACUUM_DAYS
is set with a positive integer greater than 0.Implementation in
InstallPackage
exec
orinstall
(but notaqua update
oraqua cp
, which currently useprovideNilVacuumController
in their controller initialization), package-related information is added or updated in a BoltDB (vacuum.db
) located inROOT_DIR
.Expiration Criteria
AQUA_VACUUM_DAYS
. Expired packages become eligible for "vacuum."Execution of
aqua vacuum
Commandvacuum
command performs the following operations:ROOT_DIR/pkgs/type.../pkg/version
.Optional - Not Implemented: Automatic Vacuum Execution
aqua install
command to trigger vacuum expiration automatically, enabling "auto-cleaning."Visualizing Managed Packages
The
aqua vacuum
command includes two flags,-list
and-expired
:-list
lists all packages.-expired
lists only expired packages.These flags enable the use of a fuzzy finder in the console to explore the
vacuum.db
database and retrieve package-related information:Coverage
A high latence is observed for test du to tests of failling access to vacuum.db
Performance Considerations
Significant effort has been made to minimize the overhead introduced by this feature on Aqua commands. Benchmark tests in the
exec
controller attest to this.Three methods for adding packages to the database were implemented (storePackage, storesPackages, and asyncStorePackage). However, the asynchronous method (asyncStorePackage) shows the least overhead and is preferred. I think i'll remove all others.
Benchmark when adding 100 packages at the same time :
The asynchronous method is the best option in all cases, especially when adding one package at a time (the typical use case for Aqua).
Benchmark of exec command via hyperfine :
Main impact of storing packages is the method PackageInfo.PkgPaths, but i think the best choice to have all Paths concerned by each package. There are used to remove package from system during vacuum operation.