-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting the process of API discussion #1
Conversation
Co-authored-by: Shushman <[email protected]>
I'll let the others chime in on the API as @rejuvyesh and I probably cut some corners for convenience in the paper experiments. I'm putting a mini-checklist of things to do and/or think about:
|
src/fcmcts/fcmcts.jl
Outdated
@@ -0,0 +1,307 @@ | |||
|
|||
|
|||
@with_kw mutable struct FCMCTSSolver <: AbstractMCTSSolver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Add docu ala https://github.com/JuliaPOMDP/MCTS.jl/blob/a24e92d108853a0410ac0a61f64d4e53a8623e71/src/vanilla.jl#L1-L52
src/fvmcts/fv_mcts_vanilla.jl
Outdated
struct VarEl <: AbstractCoordinationStrategy | ||
end | ||
|
||
@with_kw struct MaxPlus <:AbstractCoordinationStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical docstring
Agreed. Let's change |
Nice - this seems cool - are you looking for feedback on this PR or on MAPOMDPs? |
A few notes on MAPOMDPs:
|
Thanks for the notes @zsunberg ! Feedback on both would be great but the MAPOMDPs (or MultiAgentPOMDPs I guess) is more critical since it would be the problem template (versus this specific solver). |
These are all good points.
That's reasonable. Do you think we should rename this to MultiAgentMCTS too then.
Right now we were overloading this to return a vector of rewards. That's why we had to modify the rollout function for
It was just easy. We'll rename these. |
I think making POMDPs.reward(m::JointMDP, s, a, sp) = sum(agent_rewards(m, s, a, sp)) by default. |
@rejuvyesh from the |
Not yet. But should do something about |
src/fvmcts/fv_mcts_vanilla.jl
Outdated
|
||
# MMDP reqs | ||
@req get_agent_actions(::P, ::Int64) | ||
@req get_agent_actions(::P, ::Int64, ::eltype(SV)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the get_
style naming
@@ -315,7 +318,7 @@ function simulate(planner::FVMCTSPlanner, node::FVStateNode, depth::Int64) | |||
ucb_action = coordinate_action(mdp, planner.tree, s, planner.solver.exploration_constant, node.id) | |||
|
|||
# Monte Carlo Transition | |||
sp, r = gen(DDNOut(:sp, :r), mdp, s, ucb_action, rng) | |||
sp, r = @gen(:sp, :r)(mdp, s, ucb_action, rng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what you meant by the new POMDPs.jl API? No more DDN??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as of POMDPs 0.9 we decided to get rid of the flexible DDN.
src/fcmcts/fcmcts.jl
Outdated
@@ -15,26 +15,26 @@ end | |||
mutable struct FCMCTSTree{S,A} | |||
# To track if state node in tree already | |||
# NOTE: We don't strictly need this at all if no tree reuse... | |||
state_map::Dict{AbstractVector{S},Int64} | |||
state_map::Dict{S,Int64} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so we're going to delegate the joint state / joint action aspect in FCMCTS to the domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does beg the question of why we need FCMCTS at all. Can't we just use vanilla MCTS as the baseline then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be able to do away with FCMCTS
need to check a few things with actions I think.
src/fcmcts/fcmcts.jl
Outdated
@@ -212,7 +212,7 @@ function insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner, s) | |||
end | |||
|
|||
# NOTE: Doing state-dep actions here the JointMDP way | |||
state_dep_jtactions = vec(map(collect, Iterators.product((get_agent_actions(planner.mdp, i, si) for (i, si) in enumerate(s))...))) | |||
state_dep_jtactions = vec(map(collect, Iterators.product((agent_actions(planner.mdp, i, si) for (i, si) in enumerate(s))...))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this is the only place we are deviating from vanilla MCTS on an unfactored MMDP. I suppose it is worth maintaining separately for this alone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! But should be just define:
POMDPs.actions(mdp::JointMDP, s) = vec(map(collect, Iterators.product((agent_actions(mdp, i, si) for (i, si) in enumerate(s))...)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, yeah if we do this then we shouldn't need FCMCTS anymore at all!
What we can do for now is to remove FCMCTS
from this repo so it does not hold up the rest of it. Later on, we can fix our own baseline scripts to use vanilla MCTS.
src/fcmcts/fcmcts.jl
Outdated
mdp = planner.mdp | ||
P = typeof(mdp) | ||
@assert P <: JointMDP # req does different thing? | ||
SV = statetype(P) | ||
@assert typeof(SV) <: AbstractVector # TODO: Is this correct? | ||
#SV = statetype(P) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment? Since we are using ::SV
further down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But ::SV
need not be AbstractVector
and can have other global info too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to uncomment SV
since it is used further down
src/fcmcts/fcmcts.jl
Outdated
@req get_agent_actions(::P, ::Int64) | ||
@req get_agent_actions(::P, ::Int64, ::eltype(SV)) | ||
@req agent_actions(::P, ::Int64) | ||
@req agent_actions(::P, ::Int64, ::eltype(SV)) # TODO should this be eltype? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine as we do use enumerate(s)
above. We are implicitly requiring that the datastructure be something we can apply eltype
to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is what I am unsure regarding @zsunberg's comments. We are assuming that S
for JointMDP is something we can enumerate into agent specific things (which might just be copies of the same thing), but there could instead just be a single global state too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine. Worst case it tells the problem writer that they need to implement agent_actions(::P, ::Int64, ::Any)
src/fcmcts/fcmcts.jl
Outdated
@@ -292,6 +292,7 @@ POMDPLinter.@POMDP_require insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner | |||
S = eltype(SV) | |||
|
|||
# TODO: Review IQ and IN | |||
# Should this be ::S or ::SV? We can have global state that's not a vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, can we? Because we were doing enumerate(s)
before when collecting the actions. Is enumerate
possible on anything other than a vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is something I am not sure about. Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SV
, but see my main review comment.
|
||
comp_ac_idx = LinearIndices(comp_tup)[local_action_idxs...] | ||
|
||
# NOTE: NOW we can update stats. Could generalize incremental update more here | ||
lock(tree.lock) do | ||
tree.coordination_stats.n_component_stats[s][idx][comp_ac_idx] += 1 | ||
q_comp_value = sum(q[c] for c in comp) | ||
q_comp_value = sum(q[c] for c in comp) # TODO!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the issue here?
# NOTE: NOW we can update stats. Could generalize incremental update more here | ||
lock(tree.lock) do | ||
tree.coordination_stats.n_component_stats[s][idx][comp_ac_idx] += 1 | ||
q_comp_value = q # TODO!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The TODO here is warranted haha. The equivalent for the q::AbstractVector
case should be
q_comp_value = q * length(comp)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if we had a global reward vector where all q values were the same, then the line q_comp_value = sum(q[c] for c in comp)
would yield the same result right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the thing I wanted to confirm and seems alright.
end | ||
|
||
mp_stats = MaxPlusStatistics{eltype(init_state)}(adjmatgraph, | ||
mp_stats = MaxPlusStatistics{S}(adjmatgraph, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Even in the previous version, this should have been S
and not eltype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The only possible logical issue would have been the update_statistics
where q::Float64
, but I think my suggestion there should recover the same behavior.
@Shushman please take another look at the scaler global reward handling when you get the time. |
@zsunberg Should be rename this to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The details that were commented on seem good to me, but @rejuvyesh 's comment
Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?
makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S
or SV
?
src/fcmcts/fcmcts.jl
Outdated
mdp = planner.mdp | ||
P = typeof(mdp) | ||
@assert P <: JointMDP # req does different thing? | ||
SV = statetype(P) | ||
@assert typeof(SV) <: AbstractVector # TODO: Is this correct? | ||
#SV = statetype(P) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to uncomment SV
since it is used further down
src/fcmcts/fcmcts.jl
Outdated
@req get_agent_actions(::P, ::Int64) | ||
@req get_agent_actions(::P, ::Int64, ::eltype(SV)) | ||
@req agent_actions(::P, ::Int64) | ||
@req agent_actions(::P, ::Int64, ::eltype(SV)) # TODO should this be eltype? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine. Worst case it tells the problem writer that they need to implement agent_actions(::P, ::Int64, ::Any)
src/fcmcts/fcmcts.jl
Outdated
@@ -292,6 +292,7 @@ POMDPLinter.@POMDP_require insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner | |||
S = eltype(SV) | |||
|
|||
# TODO: Review IQ and IN | |||
# Should this be ::S or ::SV? We can have global state that's not a vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SV
, but see my main review comment.
@Shushman , @mykelk correct me if I am wrong, but in MultiAgent MDP, all agents have access to the full global state information but the agent state could differ so as to allow say rotations or relative positions or agent identity? |
According to DMU, in an MMDP, each agent is able to observe the full true state, i.e., the states of all other agents in the problem. |
Hmm. But then the issue for us is that we shouldn't require |
Ah, OK, this makes sense to me. Whenever an agent needs to make a decision, it should be given access to the global state, and there is no reason the global state needs to be an Then is there a reason to have any concept of individual states? Perhaps individual states would be useful for the common suboptimal solution of having agents feedback on their own states or a smaller view of the state? If the concept of individual states is needed, I think it would be reasonable to say that states in a |
src/MAMCTS.jl
Outdated
step = 1 | ||
|
||
# TODO: doesn't this add unnecessary action search? | ||
r = @gen(:r)(mdp, s, action(policy, s), sim.rng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh under the circumstances it is the least bad option, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I could do a random action as well?
src/fvmcts/fv_mcts_vanilla.jl
Outdated
#@assert typeof(SV) <: AbstractVector | ||
AV = actiontype(P) | ||
@assert typeof(A) <: AbstractVector | ||
@req discount(::P) | ||
@req isterminal(::P, ::SV) | ||
@subreq insert_node!(planner.tree, planner, s) | ||
@subreq estimate_value(planner.solved_estimate, mdp, s, depth) | ||
@req gen(::DDNOut{(:sp, :r)}, ::P, ::SV, ::A, ::typeof(planner.rng)) | ||
@req gen(::P, ::SV, ::A, ::typeof(planner.rng)) # XXX this is not exactly right - it could be satisfied with transition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I see the issue here. Don't we require a generative model to be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCTS.jl has the same thing 😛. @zsunberg might know the exact reason for the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalar global rewards (and the others) look good to me
Renamed to FactoredValueMCTS.jl everywhere. Will figure out later if |
MAPOMDPs.jl should be the minimal API like package for defining multi-agent POMDP Models and their Solvers. This package will contain our factored value solvers as well as naive centralized solver using MCTS. We could potentially add decentralized MCTS here as well later.
If we look at the current API, we have vector of states/observations for each agent and a vector of actions for each agent in
gen
(to be changed to@gen
). We also return a vector of rewards corresponding to each agent. So right now the setup can support POSG (partially observed stochastic game) if the rewards are different for each agent.