Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loggers into TunedModels #193

Merged
merged 2 commits into from
May 21, 2024
Merged

Adding loggers into TunedModels #193

merged 2 commits into from
May 21, 2024

Conversation

pebeto
Copy link
Member

@pebeto pebeto commented Sep 10, 2023

Details in JuliaAI/MLJ.jl#1029.

  • Adding parametric type L for loggers (detailed implementation in MLJBase.jl).

@codecov
Copy link

codecov bot commented Sep 10, 2023

Codecov Report

Attention: Patch coverage is 90.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 87.55%. Comparing base (bb59cae) to head (2b63fa8).

Files Patch % Lines
src/tuned_models.jl 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #193      +/-   ##
==========================================
+ Coverage   87.53%   87.55%   +0.01%     
==========================================
  Files          13       13              
  Lines         666      667       +1     
==========================================
+ Hits          583      584       +1     
  Misses         83       83              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ablaom
Copy link
Member

ablaom commented Sep 11, 2023

Looking good, thanks!

Does it all look good on the MLflow service when fitting a TunedModel(model, logger=MLFlowLogger(...), ...)?

@pebeto
Copy link
Member Author

pebeto commented Sep 11, 2023

Looking good locally. I've just uploaded the TunedModel test case JuliaAI/MLJFlow.jl@2153b69

@ablaom
Copy link
Member

ablaom commented Sep 11, 2023

Played around with this some more. Very cool, thanks!

However, there is a problem running in multithread mode. It seem only one thread is logging:

using MLJ
using .Threads
using MLFlowClient
nthreads()
# 5

logger = MLFlowLogger("http://127.0.0.1:5000", experiment_name="horse")
X, y = make_moons()
model = (@load RandomForestClassifier pkg=DecisionTree)()

r = range(model, :sampling_fraction, lower=0.4, upper=1.0)

tmodel = TunedModel(
    model;
    range=r,
    logger,
    acceleration=CPUThreads(),
    n=100,
)

mach = machine(tmodel, X, y) |> fit!;
nruns = length(report(mach).history)
# 100

service = MLJFlow.service(logger)
experiment = MLFlowClient.getexperiment(service, "horse")
id = experiment.experiment_id
runs = MLFlowClient.searchruns(service, id);
length(runs)
# 20

@assert length(runs) == nruns
# ERROR: AssertionError: length(runs) == nruns
# Stacktrace:
#  [1] top-level scope
#    @ REPL[166]:1

@ablaom
Copy link
Member

ablaom commented Sep 11, 2023

The problem is we are missing logger in the cloning of the resampling machine happening here:

https://github.com/pebeto/MLJTuning.jl/blob/6f295b7439a9884fa35c16841ded33db2d272227/src/tuned_models.jl#L590

@ablaom
Copy link
Member

ablaom commented Sep 11, 2023

I think CPUProcesses should be fine, but we should add a test for this at MLJFlow.jl (and for CPUThreads).

@ablaom
Copy link
Member

ablaom commented Sep 24, 2023

Thanks for the addition. Sadly, this is still not working for me. I'm getting three experiments, with different id's and same name, "horse" on the server. (I'm only expecting one). One contains 20 evaluations, the other two contains only 1 each, and this complaint is thrown several times:

    {"error_code": "RESOURCE_ALREADY_EXISTS", "message": "Experiment 'horse' already exists."}""")

Do you have any idea what is happening?

ERROR: TaskFailedException

nested task error: HTTP.Exceptions.StatusError(400, "POST", "/api/2.0/mlflow/experiments/create", HTTP.Messages.Response:
"""
HTTP/1.1 400 Bad Request
Server: gunicorn
Date: Sun, 24 Sep 2023 19:40:45 GMT
Connection: close
Content-Type: application/json
Content-Length: 90

{"error_code": "RESOURCE_ALREADY_EXISTS", "message": "Experiment 'horse' already exists."}""")
Stacktrace:
  [1] mlfpost(mlf::MLFlow, endpoint::String; kwargs::Base.Pairs{Symbol, Union{Missing, Nothing, String}, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:name, :artifact_location, :tags), Tuple{String, Nothing, Missing}}})
    @ MLFlowClient ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:74
  [2] mlfpost
    @ ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:66 [inlined]
  [3] createexperiment(mlf::MLFlow; name::String, artifact_location::Nothing, tags::Missing)                                                                                    
    @ MLFlowClient ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:21
  [4] createexperiment
    @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:16 [inlined]
  [5] #getorcreateexperiment#7
    @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:103 [inlined]
  [6] log_evaluation(logger::MLFlowLogger, performance_evaluation::PerformanceEvaluation

{MLJDecisionTreeInterface.RandomForestClassifier, Vector{LogLoss{Float64}}, Vector{Float64}, Vector{typeof(predict)}, Vector{Vector{Float64}}, Vector{Vector{Vector{Float64}}}, Vector{NamedTuple{(:forest,), Tuple{DecisionTree.Ensemble{Float64, UInt32}}}}, Vector{NamedTuple{(:features,), Tuple{Vector{Symbol}}}}, Holdout})
@ MLJFlow ~/.julia/packages/MLJFlow/TqEtw/src/base.jl:2
[7] evaluate!(mach::Machine{MLJDecisionTreeInterface.RandomForestClassifier, true}, resampling::Vector{Tuple{Vector{Int64}, Vector{Int64}}}, weights::Nothing, class_weights::Nothing, rows::Nothing, verbosity::Int64, repeats::Int64, measures::Vector{LogLoss{Float64}}, operations::Vector{typeof(predict)}, acceleration::CPU1{Nothing}, force::Bool, logger::MLFlowLogger, user_resampling::Holdout)
@ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1314
[8] evaluate!(::Machine{MLJDecisionTreeInterface.RandomForestClassifier, true}, ::Holdout, ::Nothing, ::Nothing, ::Nothing, ::Int64, ::Int64, ::Vector{LogLoss{Float64}}, ::Vector
{typeof(predict)}, ::CPU1{Nothing}, ::Bool, ::MLFlowLogger, ::Holdout)
@ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1335
[9] fit(::Resampler{Holdout, MLFlowLogger}, ::Int64, ::Tables.MatrixTable{Matrix{Float64}}, ::CategoricalArrays.CategoricalVector{Int64, UInt32, Int64, CategoricalArrays.CategoricalValue{Int64, UInt32}, Union{}})
@ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1494
[10] fit_only!(mach::Machine{Resampler{Holdout, MLFlowLogger}, false}; rows::Nothing, verbosity::Int64, force::Bool, composite::Nothing)
@ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:680
[11] fit_only!
@ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:606 [inlined]
[12] #fit!#63
@ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:778 [inlined]
[13] fit!
@ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:775 [inlined]
[14] event!(metamodel::MLJDecisionTreeInterface.RandomForestClassifier, resampling_machine::Machine{Resampler{Holdout, MLFlowLogger}, false}, verbosity::Int64, tuning::RandomSearch, history::Nothing, state::Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}})
@ MLJTuning ~/MLJ/MLJTuning/src/tuned_models.jl:443
[15] #46
@ ~/MLJ/MLJTuning/src/tuned_models.jl:597 [inlined]
[16] iterate
@ ./generator.jl:47 [inlined]
[17] _collect(c::Vector{MLJDecisionTreeInterface.RandomForestClassifier}, itr::Base.Generator{Vector{MLJDecisionTreeInterface.RandomForestClassifier}, MLJTuning.var"#46#50"{Int64, RandomSearch, Nothing, Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, Channel{Bool}, Vector{Machine{Resampler{Holdout, MLFlowLogger}, false}}, Int64}}, #unused#::Base.EltypeUnknown, isz::Base.HasShape{1})
@ Base ./array.jl:802
[18] collect_similar
@ ./array.jl:711 [inlined]
[19] map
@ ./abstractarray.jl:3261 [inlined]
[20] macro expansion
@ ~/MLJ/MLJTuning/src/tuned_models.jl:596 [inlined]
[21] (::MLJTuning.var"#45#49"{Vector{MLJDecisionTreeInterface.RandomForestClassifier}, Int64, RandomSearch, Nothing, Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, Channel{Bool}, Vector{Any}, Vector{Machine{Resampler{Holdout, MLFlowLogger}, false}}, UnitRange{Int64}, Int64})()
@ MLJTuning ./threadingconstructs.jl:373

@ablaom
Copy link
Member

ablaom commented Sep 24, 2023

Interestingly, I'm getting the same kind of error for acceleration=Distributed:

using Distributed
addprocs(2)

nprocs()
# 3

using MLJ
using MLFlowClient
logger = MLFlowLogger("http://127.0.0.1:5000", experiment_name="rock")

X, y = make_moons()
model = (@iload RandomForestClassifier pkg=DecisionTree)()

r = range(model, :sampling_fraction, lower=0.4, upper=1.0)

tmodel = TunedModel(
    model;
    range=r,
    logger,
    acceleration=CPUProcesses(),
    n=100,
)

mach = machine(tmodel, X, y) |> fit!;
[ Info: Training machine(ProbabilisticTunedModel(model = RandomForestClassifier(max_depth = -1, …), …), …).
[ Info: Attempting to evaluate 100 models.
      From worker 3:    ┌ Error: Problem fitting the machine machine(Resampler(model = RandomForestClassifier(max_depth = -1, …), …), …). 
      From worker 3:    └ @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:682
      From worker 3:    [ Info: Running type checks... 
      From worker 3:    [ Info: Type checks okay. 
Evaluating over 100 metamodels:  50%[============>            ]  ETA: 0:00:15┌ Error: Proble
m fitting the machine machine(ProbabilisticTunedModel(model = RandomForestClassifier(max_depth = -1, …), …), …). 
└ @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:682
[ Info: Running type checks... 
[ Info: Type checks okay. 
ERROR: TaskFailedException
Stacktrace:
  [1] wait
    @ ./task.jl:349 [inlined]
  [2] fetch
    @ ./task.jl:369 [inlined]
  [3] preduce(reducer::Function, f::Function, R::Vector{MLJDecisionTreeInterface.RandomForestClassifier})                                          
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/macros.jl:274
  [4] macro expansion
    @ ~/MLJ/MLJTuning/src/tuned_models.jl:521 [inlined]
  [5] macro expansion
    @ ./task.jl:476 [inlined]
  [6] assemble_events!(metamodels::Vector{MLJDecisionTreeInterface.RandomForestClassifier}, 
resampling_machine::Machine{Resampler{Holdout, MLFlowLogger}, false}, verbosity::Int64, tuning::RandomSearch, history::Nothing, state::Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, acceleration::CPUProcesses{Nothing})
    @ MLJTuning ~/MLJ/MLJTuning/src/tuned_models.jl:502
  [7] build!(history::Nothing, n::Int64, tuning::RandomSearch, model::MLJDecisionTreeInterface.RandomForestClassifier, model_buffer::Channel{Any}, state::Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, verbosity::Int64, acceleration::CPUProcesses{Nothing}, resampling_machine::Machine{Resampler{Holdout, MLFlowLogger}, false})                                                           
    @ MLJTuning ~/MLJ/MLJTuning/src/tuned_models.jl:675
  [8] fit(::MLJTuning.ProbabilisticTunedModel{RandomSearch, MLJDecisionTreeInterface.RandomForestClassifier, MLFlowLogger}, ::Int64, ::Tables.MatrixTable{Matrix{Float64}}, ::CategoricalArrays.CategoricalVector{Int64, UInt32, Int64, CategoricalArrays.CategoricalValue{Int64, UInt32}, Union{}})         
    @ MLJTuning ~/MLJ/MLJTuning/src/tuned_models.jl:756
  [9] fit_only!(mach::Machine{MLJTuning.ProbabilisticTunedModel{RandomSearch, MLJDecisionTreeInterface.RandomForestClassifier, MLFlowLogger}, false}; rows::Nothing, verbosity::Int64, force::Bool, composite::Nothing)                                                            
    @ MLJBase ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:680
 [10] fit_only!
    @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:606 [inlined]
 [11] #fit!#63
    @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:778 [inlined]
 [12] fit!
    @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:775 [inlined]
 [13] |>(x::Machine{MLJTuning.ProbabilisticTunedModel{RandomSearch, MLJDecisionTreeInterface.RandomForestClassifier, MLFlowLogger}, false}, f::typeof(fit!))
    @ Base ./operators.jl:907
 [14] top-level scope
    @ REPL[16]:1

    nested task error: On worker 3:
    HTTP.Exceptions.StatusError(400, "POST", "/api/2.0/mlflow/experiments/create", HTTP.Messages.Response:
    """
    HTTP/1.1 400 Bad Request
    Server: gunicorn
    Date: Sun, 24 Sep 2023 20:07:23 GMT
    Connection: close
    Content-Type: application/json
    Content-Length: 89
    
    {"error_code": "RESOURCE_ALREADY_EXISTS", "message": "Experiment 'rock' already exists."}""")
    Stacktrace:
      [1] #mlfpost#3
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:74
      [2] mlfpost
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/utils.jl:66 [inlined]
      [3] #createexperiment#6
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:21
      [4] createexperiment
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:16 [inlined]
      [5] #getorcreateexperiment#7
        @ ~/.julia/packages/MLFlowClient/Szkbv/src/experiments.jl:103 [inlined]
      [6] log_evaluation
        @ ~/.julia/packages/MLJFlow/TqEtw/src/base.jl:2
      [7] evaluate!
        @ ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1314
      [8] evaluate!
        @ ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1335
      [9] fit
        @ ~/.julia/packages/MLJBase/ByFwA/src/resampling.jl:1494
     [10] #fit_only!#57
        @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:680
     [11] fit_only!
        @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:606 [inlined]
     [12] #fit!#63
        @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:778 [inlined]
     [13] fit!
        @ ~/.julia/packages/MLJBase/ByFwA/src/machines.jl:775 [inlined]
     [14] event!
        @ ~/MLJ/MLJTuning/src/tuned_models.jl:443
     [15] macro expansion
        @ ~/MLJ/MLJTuning/src/tuned_models.jl:522 [inlined]
     [16] #39
        @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/macros.jl:288
     [17] #invokelatest#2
        @ ./essentials.jl:816
     [18] invokelatest
        @ ./essentials.jl:813
     [19] #110
        @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285
     [20] run_work_thunk
        @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:70
     [21] macro expansion
        @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:285 [inlined]
     [22] #109
        @ ./task.jl:514
    Stacktrace:
     [1] remotecall_fetch(::Function, ::Distributed.Worker, ::Function, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})                      
       @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:465
     [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Function, ::Vararg{Any})
       @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:454
     [3] #remotecall_fetch#162
       @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
     [4] remotecall_fetch
       @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/remotecall.jl:492 [inlined]
     [5] (::Distributed.var"#175#176"{typeof(vcat), MLJTuning.var"#39#42"{Machine{Resampler{Holdout, MLFlowLogger}, false}, Int64, RandomSearch, Nothing, Vector{Tuple{Symbol, MLJBase.NumericSampler{Float64, Distributions.Uniform{Float64}, Symbol}}}, RemoteChannel{Channel{Bool}}}, Vector{MLJDecisionTreeInterface.RandomForestClassifier}, Vector{UnitRange{Int64}}, Int64, Int64})()
       @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/macros.jl:270

@ablaom
Copy link
Member

ablaom commented Sep 24, 2023

Okay, see here for a MWE: JuliaAI/MLFlowClient.jl#40

@ablaom
Copy link
Member

ablaom commented Jan 23, 2024

Revisiting this issue after a few months.

It looks like the multithreading issue is not likely to be addressed soon. Perhaps we can proceed with this PR, after strictly ruling out logging for the parallel modes. For example, if logger is different from nothing, and either acceleration or acceleration_resampling are different from CPU1(), then clean! resets the accelerations to CPU() and issues a message saying what it has done and why. The clean! code is here.

@pebeto What do you think?

@pebeto
Copy link
Member Author

pebeto commented Mar 7, 2024

The solution to this issue is not part of the mlflow plans (see mlflow/mlflow#11122). However, a workaround is presented here: JuliaAI/MLJFlow.jl#36 to ensure our process is thread-safe.

@ablaom ablaom merged commit dc1d6d4 into JuliaAI:dev May 21, 2024
4 checks passed
This was referenced May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants