Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting the process of API discussion #1

Merged
merged 17 commits into from
Jan 5, 2021
Merged

Starting the process of API discussion #1

merged 17 commits into from
Jan 5, 2021

Conversation

rejuvyesh
Copy link
Member

@rejuvyesh rejuvyesh commented Oct 29, 2020

MAPOMDPs.jl should be the minimal API like package for defining multi-agent POMDP Models and their Solvers. This package will contain our factored value solvers as well as naive centralized solver using MCTS. We could potentially add decentralized MCTS here as well later.

If we look at the current API, we have vector of states/observations for each agent and a vector of actions for each agent in gen (to be changed to @gen). We also return a vector of rewards corresponding to each agent. So right now the setup can support POSG (partially observed stochastic game) if the rewards are different for each agent.

@Shushman
Copy link
Collaborator

I'll let the others chime in on the API as @rejuvyesh and I probably cut some corners for convenience in the paper experiments. I'm putting a mini-checklist of things to do and/or think about:

  • Add docstrings to the various functions
  • Should we call the centralized one FCMCTS? It's very close to FVMCTS and might cause confusion. I do realize we are using JointMCTSTree in fv_mcts_vanilla. So we could either change FCMCTS to JointMCTS and JointMCTSTree to FVMCTSTree or just change FCMCTS to MMCTS

src/MAMCTS.jl Outdated Show resolved Hide resolved
@@ -0,0 +1,307 @@


@with_kw mutable struct FCMCTSSolver <: AbstractMCTSSolver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct VarEl <: AbstractCoordinationStrategy
end

@with_kw struct MaxPlus <:AbstractCoordinationStrategy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical docstring

@rejuvyesh
Copy link
Member Author

  • Should we call the centralized one FCMCTS? It's very close to FVMCTS and might cause confusion. I do realize we are using JointMCTSTree in fv_mcts_vanilla. So we could either change FCMCTS to JointMCTS and JointMCTSTree to FVMCTSTree or just change FCMCTS to MMCTS

Agreed. Let's change FCMCTS to JointMCTS and current JointMCTS* to FVMCTS*

@zsunberg
Copy link
Member

zsunberg commented Oct 31, 2020

Nice - this seems cool - are you looking for feedback on this PR or on MAPOMDPs?

@zsunberg
Copy link
Member

zsunberg commented Oct 31, 2020

A few notes on MAPOMDPs:

  • It should probably be called MultiAgentPOMDPs if we think we are ever going to register it.
  • I think it should be JointMDP{S, A<:AbstractVector} <: MDP{S, A} because
    • the shared state might contain something more than just the individual states of the agents
    • it is good to let the problem writer specify the concrete action vector type otherwise MDP solvers might have trouble with type stability
  • POMDPs.reward should return a scalar so that MDP/POMDP solvers can solve the problems. How does the problem writer specify the vector of rewards for each agent?
  • Using get is not typical julia style. What is the reason you decided to use it?
  • Docstrings do not normal contain function.

@Shushman
Copy link
Collaborator

Shushman commented Nov 1, 2020

Thanks for the notes @zsunberg ! Feedback on both would be great but the MAPOMDPs (or MultiAgentPOMDPs I guess) is more critical since it would be the problem template (versus this specific solver).

@rejuvyesh
Copy link
Member Author

These are all good points.

  • It should probably be called MultiAgentPOMDPs if we think we are ever going to register it.

That's reasonable. Do you think we should rename this to MultiAgentMCTS too then.

  • POMDPs.reward should return a scalar so that MDP/POMDP solvers can solve the problems. How does the problem writer specify the vector of rewards for each agent?

Right now we were overloading this to return a vector of rewards. That's why we had to modify the rollout function for JointMDP. We'll have to think about this.

  • Using get is not typical julia style. What is the reason you decided to use it?

It was just easy. We'll rename these.

@zsunberg
Copy link
Member

zsunberg commented Nov 1, 2020

I think making POMDPs.reward return a scalar is pretty important. The point of making a Joint(PO)MDP a (PO)MDP is so that you can demonstrate that multi-agent approaches outperform monolithic POMDP solvers. If POMDPs.reward returns a vector, the monolithic POMDP solvers will not be able to solve it. I think you should introduce MultiAgentPOMDPs.agent_rewards and have

POMDPs.reward(m::JointMDP, s, a, sp) = sum(agent_rewards(m, s, a, sp))

by default.

@Shushman
Copy link
Collaborator

Shushman commented Nov 4, 2020

@rejuvyesh from the FVMCTSSolver type (https://github.com/JuliaPOMDP/MAMCTS.jl/blob/impl/src/fvmcts/fv_mcts_vanilla.jl#L21), do we ever use the reuse_tree or max_time arguments anywhere? In the domains?

@rejuvyesh
Copy link
Member Author

@rejuvyesh from the FVMCTSSolver type (https://github.com/JuliaPOMDP/MAMCTS.jl/blob/impl/src/fvmcts/fv_mcts_vanilla.jl#L21), do we ever use the reuse_tree or max_time arguments anywhere? In the domains?

Not yet. But should do something about max_time at least.


# MMDP reqs
@req get_agent_actions(::P, ::Int64)
@req get_agent_actions(::P, ::Int64, ::eltype(SV))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the get_ style naming

@@ -315,7 +318,7 @@ function simulate(planner::FVMCTSPlanner, node::FVStateNode, depth::Int64)
ucb_action = coordinate_action(mdp, planner.tree, s, planner.solver.exploration_constant, node.id)

# Monte Carlo Transition
sp, r = gen(DDNOut(:sp, :r), mdp, s, ucb_action, rng)
sp, r = @gen(:sp, :r)(mdp, s, ucb_action, rng)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you meant by the new POMDPs.jl API? No more DDN??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as of POMDPs 0.9 we decided to get rid of the flexible DDN.

@@ -15,26 +15,26 @@ end
mutable struct FCMCTSTree{S,A}
# To track if state node in tree already
# NOTE: We don't strictly need this at all if no tree reuse...
state_map::Dict{AbstractVector{S},Int64}
state_map::Dict{S,Int64}
Copy link
Collaborator

@Shushman Shushman Jan 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so we're going to delegate the joint state / joint action aspect in FCMCTS to the domain?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does beg the question of why we need FCMCTS at all. Can't we just use vanilla MCTS as the baseline then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be able to do away with FCMCTS need to check a few things with actions I think.

@@ -212,7 +212,7 @@ function insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner, s)
end

# NOTE: Doing state-dep actions here the JointMDP way
state_dep_jtactions = vec(map(collect, Iterators.product((get_agent_actions(planner.mdp, i, si) for (i, si) in enumerate(s))...)))
state_dep_jtactions = vec(map(collect, Iterators.product((agent_actions(planner.mdp, i, si) for (i, si) in enumerate(s))...)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this is the only place we are deviating from vanilla MCTS on an unfactored MMDP. I suppose it is worth maintaining separately for this alone

Copy link
Member Author

@rejuvyesh rejuvyesh Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! But should be just define:

POMDPs.actions(mdp::JointMDP, s) =  vec(map(collect, Iterators.product((agent_actions(mdp, i, si) for (i, si) in enumerate(s))...)))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yeah if we do this then we shouldn't need FCMCTS anymore at all!
What we can do for now is to remove FCMCTS from this repo so it does not hold up the rest of it. Later on, we can fix our own baseline scripts to use vanilla MCTS.

mdp = planner.mdp
P = typeof(mdp)
@assert P <: JointMDP # req does different thing?
SV = statetype(P)
@assert typeof(SV) <: AbstractVector # TODO: Is this correct?
#SV = statetype(P)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncomment? Since we are using ::SV further down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ::SV need not be AbstractVector and can have other global info too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to uncomment SV since it is used further down

@req get_agent_actions(::P, ::Int64)
@req get_agent_actions(::P, ::Int64, ::eltype(SV))
@req agent_actions(::P, ::Int64)
@req agent_actions(::P, ::Int64, ::eltype(SV)) # TODO should this be eltype?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as we do use enumerate(s) above. We are implicitly requiring that the datastructure be something we can apply eltype to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is what I am unsure regarding @zsunberg's comments. We are assuming that S for JointMDP is something we can enumerate into agent specific things (which might just be copies of the same thing), but there could instead just be a single global state too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. Worst case it tells the problem writer that they need to implement agent_actions(::P, ::Int64, ::Any)

@@ -292,6 +292,7 @@ POMDPLinter.@POMDP_require insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner
S = eltype(SV)

# TODO: Review IQ and IN
# Should this be ::S or ::SV? We can have global state that's not a vector.
Copy link
Collaborator

@Shushman Shushman Jan 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, can we? Because we were doing enumerate(s) before when collecting the actions. Is enumerate possible on anything other than a vector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is something I am not sure about. Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SV, but see my main review comment.


comp_ac_idx = LinearIndices(comp_tup)[local_action_idxs...]

# NOTE: NOW we can update stats. Could generalize incremental update more here
lock(tree.lock) do
tree.coordination_stats.n_component_stats[s][idx][comp_ac_idx] += 1
q_comp_value = sum(q[c] for c in comp)
q_comp_value = sum(q[c] for c in comp) # TODO!!!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue here?

# NOTE: NOW we can update stats. Could generalize incremental update more here
lock(tree.lock) do
tree.coordination_stats.n_component_stats[s][idx][comp_ac_idx] += 1
q_comp_value = q # TODO!!!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The TODO here is warranted haha. The equivalent for the q::AbstractVector case should be
q_comp_value = q * length(comp) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we had a global reward vector where all q values were the same, then the line q_comp_value = sum(q[c] for c in comp) would yield the same result right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the thing I wanted to confirm and seems alright.

end

mp_stats = MaxPlusStatistics{eltype(init_state)}(adjmatgraph,
mp_stats = MaxPlusStatistics{S}(adjmatgraph,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Even in the previous version, this should have been S and not eltype

Copy link
Collaborator

@Shushman Shushman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The only possible logical issue would have been the update_statistics where q::Float64, but I think my suggestion there should recover the same behavior.

@rejuvyesh
Copy link
Member Author

@Shushman please take another look at the scaler global reward handling when you get the time.

@rejuvyesh
Copy link
Member Author

@zsunberg Should be rename this to MultiAgentMCTS.jl as well?

Copy link
Member

@zsunberg zsunberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details that were commented on seem good to me, but @rejuvyesh 's comment

Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S or SV?

mdp = planner.mdp
P = typeof(mdp)
@assert P <: JointMDP # req does different thing?
SV = statetype(P)
@assert typeof(SV) <: AbstractVector # TODO: Is this correct?
#SV = statetype(P)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to uncomment SV since it is used further down

@req get_agent_actions(::P, ::Int64)
@req get_agent_actions(::P, ::Int64, ::eltype(SV))
@req agent_actions(::P, ::Int64)
@req agent_actions(::P, ::Int64, ::eltype(SV)) # TODO should this be eltype?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. Worst case it tells the problem writer that they need to implement agent_actions(::P, ::Int64, ::Any)

@@ -292,6 +292,7 @@ POMDPLinter.@POMDP_require insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner
S = eltype(SV)

# TODO: Review IQ and IN
# Should this be ::S or ::SV? We can have global state that's not a vector.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SV, but see my main review comment.

@rejuvyesh
Copy link
Member Author

The details that were commented on seem good to me, but @rejuvyesh 's comment

Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S or SV?

@Shushman , @mykelk correct me if I am wrong, but in MultiAgent MDP, all agents have access to the full global state information but the agent state could differ so as to allow say rotations or relative positions or agent identity?

@Shushman
Copy link
Collaborator

Shushman commented Jan 4, 2021

The details that were commented on seem good to me, but @rejuvyesh 's comment

Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S or SV?

@Shushman , @mykelk correct me if I am wrong, but in MultiAgent MDP, all agents have access to the full global state information but the agent state could differ so as to allow say rotations or relative positions or agent identity?

According to DMU, in an MMDP, each agent is able to observe the full true state, i.e., the states of all other agents in the problem.
I don't think there needs to be a separate agent state besides the full global state? It just means that any 'agent' can see the full state vector at that timestep

@rejuvyesh
Copy link
Member Author

The details that were commented on seem good to me, but @rejuvyesh 's comment

Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S or SV?

@Shushman , @mykelk correct me if I am wrong, but in MultiAgent MDP, all agents have access to the full global state information but the agent state could differ so as to allow say rotations or relative positions or agent identity?

According to DMU, in an MMDP, each agent is able to observe the full true state, i.e., the states of all other agents in the problem.
I don't think there needs to be a separate agent state besides the full global state? It just means that any 'agent' can see the full state vector at that timestep

Hmm. But then the issue for us is that we shouldn't require enumerate on JointMDP state then, because JuliaPOMDP philosophy is to not restrict the types unless required and the full state need not be a vector (there could just be a global view for example).

@zsunberg
Copy link
Member

zsunberg commented Jan 4, 2021

According to DMU, in an MMDP, each agent is able to observe the full true state, i.e., the states of all other agents in the problem. I don't think there needs to be a separate agent state besides the full global state? It just means that any 'agent' can see the full state vector at that timestep

Ah, OK, this makes sense to me. Whenever an agent needs to make a decision, it should be given access to the global state, and there is no reason the global state needs to be an AbstractVector.

Then is there a reason to have any concept of individual states?

Perhaps individual states would be useful for the common suboptimal solution of having agents feedback on their own states or a smaller view of the state? If the concept of individual states is needed, I think it would be reasonable to say that states in a JointMDP should support the iteration interface and/or have getindex. Vector and Tuple are types that supports both of these. Or it may be better to have a separate agentview function for this.

src/MAMCTS.jl Outdated
step = 1

# TODO: doesn't this add unnecessary action search?
r = @gen(:r)(mdp, s, action(policy, s), sim.rng)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh under the circumstances it is the least bad option, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I could do a random action as well?

#@assert typeof(SV) <: AbstractVector
AV = actiontype(P)
@assert typeof(A) <: AbstractVector
@req discount(::P)
@req isterminal(::P, ::SV)
@subreq insert_node!(planner.tree, planner, s)
@subreq estimate_value(planner.solved_estimate, mdp, s, depth)
@req gen(::DDNOut{(:sp, :r)}, ::P, ::SV, ::A, ::typeof(planner.rng))
@req gen(::P, ::SV, ::A, ::typeof(planner.rng)) # XXX this is not exactly right - it could be satisfied with transition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I see the issue here. Don't we require a generative model to be implemented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCTS.jl has the same thing 😛. @zsunberg might know the exact reason for the comment.

Copy link
Collaborator

@Shushman Shushman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar global rewards (and the others) look good to me

@rejuvyesh
Copy link
Member Author

Renamed to FactoredValueMCTS.jl everywhere. Will figure out later if FCMCTS is even required.

@rejuvyesh rejuvyesh merged commit e6a2ad8 into master Jan 5, 2021
@rejuvyesh rejuvyesh deleted the impl branch May 6, 2021 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants