Starting the process of API discussion #1

rejuvyesh · 2020-10-29T23:11:12Z

MAPOMDPs.jl should be the minimal API like package for defining multi-agent POMDP Models and their Solvers. This package will contain our factored value solvers as well as naive centralized solver using MCTS. We could potentially add decentralized MCTS here as well later.

If we look at the current API, we have vector of states/observations for each agent and a vector of actions for each agent in gen (to be changed to @gen). We also return a vector of rewards corresponding to each agent. So right now the setup can support POSG (partially observed stochastic game) if the rewards are different for each agent.

Co-authored-by: Shushman <[email protected]>

Shushman · 2020-10-30T01:29:43Z

I'll let the others chime in on the API as @rejuvyesh and I probably cut some corners for convenience in the paper experiments. I'm putting a mini-checklist of things to do and/or think about:

Add docstrings to the various functions
Should we call the centralized one FCMCTS? It's very close to FVMCTS and might cause confusion. I do realize we are using JointMCTSTree in fv_mcts_vanilla. So we could either change FCMCTS to JointMCTS and JointMCTSTree to FVMCTSTree or just change FCMCTS to MMCTS

src/MAMCTS.jl

Shushman · 2020-10-30T01:44:58Z

src/fcmcts/fcmcts.jl

@@ -0,0 +1,307 @@
+
+
+@with_kw mutable struct FCMCTSSolver <: AbstractMCTSSolver


Note to self: Add docu ala https://github.com/JuliaPOMDP/MCTS.jl/blob/a24e92d108853a0410ac0a61f64d4e53a8623e71/src/vanilla.jl#L1-L52

src/fvmcts/fv_mcts_vanilla.jl

Shushman · 2020-10-30T01:48:13Z

src/fvmcts/fv_mcts_vanilla.jl

+struct VarEl <: AbstractCoordinationStrategy
+end
+
+@with_kw struct MaxPlus <:AbstractCoordinationStrategy


Critical docstring

rejuvyesh · 2020-10-30T17:05:24Z

Should we call the centralized one FCMCTS? It's very close to FVMCTS and might cause confusion. I do realize we are using JointMCTSTree in fv_mcts_vanilla. So we could either change FCMCTS to JointMCTS and JointMCTSTree to FVMCTSTree or just change FCMCTS to MMCTS

Agreed. Let's change FCMCTS to JointMCTS and current JointMCTS* to FVMCTS*

zsunberg · 2020-10-31T20:13:46Z

Nice - this seems cool - are you looking for feedback on this PR or on MAPOMDPs?

zsunberg · 2020-10-31T20:26:48Z

A few notes on MAPOMDPs:

It should probably be called MultiAgentPOMDPs if we think we are ever going to register it.
I think it should be JointMDP{S, A<:AbstractVector} <: MDP{S, A} because
- the shared state might contain something more than just the individual states of the agents
- it is good to let the problem writer specify the concrete action vector type otherwise MDP solvers might have trouble with type stability
POMDPs.reward should return a scalar so that MDP/POMDP solvers can solve the problems. How does the problem writer specify the vector of rewards for each agent?
Using get is not typical julia style. What is the reason you decided to use it?
Docstrings do not normal contain function.

Shushman · 2020-11-01T00:13:27Z

Thanks for the notes @zsunberg ! Feedback on both would be great but the MAPOMDPs (or MultiAgentPOMDPs I guess) is more critical since it would be the problem template (versus this specific solver).

rejuvyesh · 2020-11-01T16:27:13Z

These are all good points.

It should probably be called MultiAgentPOMDPs if we think we are ever going to register it.

That's reasonable. Do you think we should rename this to MultiAgentMCTS too then.

POMDPs.reward should return a scalar so that MDP/POMDP solvers can solve the problems. How does the problem writer specify the vector of rewards for each agent?

Right now we were overloading this to return a vector of rewards. That's why we had to modify the rollout function for JointMDP. We'll have to think about this.

Using get is not typical julia style. What is the reason you decided to use it?

It was just easy. We'll rename these.

zsunberg · 2020-11-01T18:36:40Z

I think making POMDPs.reward return a scalar is pretty important. The point of making a Joint(PO)MDP a (PO)MDP is so that you can demonstrate that multi-agent approaches outperform monolithic POMDP solvers. If POMDPs.reward returns a vector, the monolithic POMDP solvers will not be able to solve it. I think you should introduce MultiAgentPOMDPs.agent_rewards and have

POMDPs.reward(m::JointMDP, s, a, sp) = sum(agent_rewards(m, s, a, sp))

by default.

Shushman · 2020-11-04T05:56:31Z

@rejuvyesh from the FVMCTSSolver type (https://github.com/JuliaPOMDP/MAMCTS.jl/blob/impl/src/fvmcts/fv_mcts_vanilla.jl#L21), do we ever use the reuse_tree or max_time arguments anywhere? In the domains?

rejuvyesh · 2020-11-05T04:23:08Z

@rejuvyesh from the FVMCTSSolver type (https://github.com/JuliaPOMDP/MAMCTS.jl/blob/impl/src/fvmcts/fv_mcts_vanilla.jl#L21), do we ever use the reuse_tree or max_time arguments anywhere? In the domains?

Not yet. But should do something about max_time at least.

Shushman · 2020-11-07T00:41:49Z

src/fvmcts/fv_mcts_vanilla.jl

+
+    # MMDP reqs
+    @req get_agent_actions(::P, ::Int64)
+    @req get_agent_actions(::P, ::Int64, ::eltype(SV))


Change the get_ style naming

Shushman · 2021-01-03T19:58:26Z

src/fvmcts/fv_mcts_vanilla.jl

@@ -315,7 +318,7 @@ function simulate(planner::FVMCTSPlanner, node::FVStateNode, depth::Int64)
    ucb_action = coordinate_action(mdp, planner.tree, s, planner.solver.exploration_constant, node.id)

    # Monte Carlo Transition
-    sp, r = gen(DDNOut(:sp, :r), mdp, s, ucb_action, rng)
+    sp, r = @gen(:sp, :r)(mdp, s, ucb_action, rng)


Is this what you meant by the new POMDPs.jl API? No more DDN??

Yes, as of POMDPs 0.9 we decided to get rid of the flexible DDN.

Shushman · 2021-01-03T19:59:44Z

src/fcmcts/fcmcts.jl

@@ -15,26 +15,26 @@ end
 mutable struct FCMCTSTree{S,A}
    # To track if state node in tree already
    # NOTE: We don't strictly need this at all if no tree reuse...
-    state_map::Dict{AbstractVector{S},Int64}
+    state_map::Dict{S,Int64}


I see, so we're going to delegate the joint state / joint action aspect in FCMCTS to the domain?

Does beg the question of why we need FCMCTS at all. Can't we just use vanilla MCTS as the baseline then?

I think we might be able to do away with FCMCTS need to check a few things with actions I think.

Shushman · 2021-01-03T20:01:47Z

src/fcmcts/fcmcts.jl

@@ -212,7 +212,7 @@ function insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner, s)
    end

    # NOTE: Doing state-dep actions here the JointMDP way
-    state_dep_jtactions = vec(map(collect, Iterators.product((get_agent_actions(planner.mdp, i, si) for (i, si) in enumerate(s))...)))
+    state_dep_jtactions = vec(map(collect, Iterators.product((agent_actions(planner.mdp, i, si) for (i, si) in enumerate(s))...)))


As far as I can tell, this is the only place we are deviating from vanilla MCTS on an unfactored MMDP. I suppose it is worth maintaining separately for this alone

Right! But should be just define:

POMDPs.actions(mdp::JointMDP, s) = vec(map(collect, Iterators.product((agent_actions(mdp, i, si) for (i, si) in enumerate(s))...)))

Actually, yeah if we do this then we shouldn't need FCMCTS anymore at all!
What we can do for now is to remove FCMCTS from this repo so it does not hold up the rest of it. Later on, we can fix our own baseline scripts to use vanilla MCTS.

Shushman · 2021-01-03T20:03:00Z

src/fcmcts/fcmcts.jl

    mdp = planner.mdp
    P = typeof(mdp)
    @assert P <: JointMDP       # req does different thing?
-    SV = statetype(P)
-    @assert typeof(SV) <: AbstractVector # TODO: Is this correct?
+    #SV = statetype(P)


Uncomment? Since we are using ::SV further down?

But ::SV need not be AbstractVector and can have other global info too?

It makes sense to uncomment SV since it is used further down

Shushman · 2021-01-03T20:03:46Z

src/fcmcts/fcmcts.jl

-    @req get_agent_actions(::P, ::Int64)
-    @req get_agent_actions(::P, ::Int64, ::eltype(SV))
+    @req agent_actions(::P, ::Int64)
+    @req agent_actions(::P, ::Int64, ::eltype(SV)) # TODO should this be eltype?


I think this is fine as we do use enumerate(s) above. We are implicitly requiring that the datastructure be something we can apply eltype to

Yeah, this is what I am unsure regarding @zsunberg's comments. We are assuming that S for JointMDP is something we can enumerate into agent specific things (which might just be copies of the same thing), but there could instead just be a single global state too.

I think this is fine. Worst case it tells the problem writer that they need to implement agent_actions(::P, ::Int64, ::Any)

Shushman · 2021-01-03T20:07:16Z

src/fcmcts/fcmcts.jl

@@ -292,6 +292,7 @@ POMDPLinter.@POMDP_require insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner
    S = eltype(SV)

    # TODO: Review IQ and IN
+    # Should this be ::S or ::SV? We can have global state that's not a vector.


Wait, can we? Because we were doing enumerate(s) before when collecting the actions. Is enumerate possible on anything other than a vector?

Yeah, this is something I am not sure about. Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

I think SV, but see my main review comment.

Shushman · 2021-01-03T20:09:49Z

src/fvmcts/action_coordination/varel.jl


        comp_ac_idx = LinearIndices(comp_tup)[local_action_idxs...]

        # NOTE: NOW we can update stats. Could generalize incremental update more here
        lock(tree.lock) do
            tree.coordination_stats.n_component_stats[s][idx][comp_ac_idx] += 1
-            q_comp_value = sum(q[c] for c in comp)
+            q_comp_value = sum(q[c] for c in comp) # TODO!!!


What's the issue here?

Shushman · 2021-01-03T20:11:48Z

src/fvmcts/action_coordination/varel.jl

+        # NOTE: NOW we can update stats. Could generalize incremental update more here
+        lock(tree.lock) do
+            tree.coordination_stats.n_component_stats[s][idx][comp_ac_idx] += 1
+            q_comp_value = q # TODO!!!


I see. The TODO here is warranted haha. The equivalent for the q::AbstractVector case should be
q_comp_value = q * length(comp) ?

Because if we had a global reward vector where all q values were the same, then the line q_comp_value = sum(q[c] for c in comp) would yield the same result right?

Yeah, that's the thing I wanted to confirm and seems alright.

Shushman · 2021-01-03T20:14:42Z

src/fvmcts/fv_mcts_vanilla.jl

    end

-    mp_stats = MaxPlusStatistics{eltype(init_state)}(adjmatgraph,
+    mp_stats = MaxPlusStatistics{S}(adjmatgraph,


Good catch! Even in the previous version, this should have been S and not eltype

Shushman

Looks good! The only possible logical issue would have been the update_statistics where q::Float64, but I think my suggestion there should recover the same behavior.

Won't work until other stuff registered first

rejuvyesh · 2021-01-04T03:53:31Z

@Shushman please take another look at the scaler global reward handling when you get the time.

rejuvyesh · 2021-01-04T03:53:57Z

@zsunberg Should be rename this to MultiAgentMCTS.jl as well?

zsunberg

The details that were commented on seem good to me, but @rejuvyesh 's comment

Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S or SV?

zsunberg · 2021-01-04T16:41:27Z

src/fcmcts/fcmcts.jl

    mdp = planner.mdp
    P = typeof(mdp)
    @assert P <: JointMDP       # req does different thing?
-    SV = statetype(P)
-    @assert typeof(SV) <: AbstractVector # TODO: Is this correct?
+    #SV = statetype(P)


It makes sense to uncomment SV since it is used further down

zsunberg · 2021-01-04T16:42:19Z

src/fcmcts/fcmcts.jl

-    @req get_agent_actions(::P, ::Int64)
-    @req get_agent_actions(::P, ::Int64, ::eltype(SV))
+    @req agent_actions(::P, ::Int64)
+    @req agent_actions(::P, ::Int64, ::eltype(SV)) # TODO should this be eltype?


I think this is fine. Worst case it tells the problem writer that they need to implement agent_actions(::P, ::Int64, ::Any)

zsunberg · 2021-01-04T16:48:22Z

src/fcmcts/fcmcts.jl

@@ -292,6 +292,7 @@ POMDPLinter.@POMDP_require insert_node!(tree::FCMCTSTree, planner::FCMCTSPlanner
    S = eltype(SV)

    # TODO: Review IQ and IN
+    # Should this be ::S or ::SV? We can have global state that's not a vector.


I think SV, but see my main review comment.

rejuvyesh · 2021-01-04T17:57:35Z

The details that were commented on seem good to me, but @rejuvyesh 's comment

Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S or SV?

@Shushman , @mykelk correct me if I am wrong, but in MultiAgent MDP, all agents have access to the full global state information but the agent state could differ so as to allow say rotations or relative positions or agent identity?

Shushman · 2021-01-04T18:01:44Z

The details that were commented on seem good to me, but @rejuvyesh 's comment

Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S or SV?

@Shushman , @mykelk correct me if I am wrong, but in MultiAgent MDP, all agents have access to the full global state information but the agent state could differ so as to allow say rotations or relative positions or agent identity?

According to DMU, in an MMDP, each agent is able to observe the full true state, i.e., the states of all other agents in the problem.
I don't think there needs to be a separate agent state besides the full global state? It just means that any 'agent' can see the full state vector at that timestep

rejuvyesh · 2021-01-04T18:10:53Z

The details that were commented on seem good to me, but @rejuvyesh 's comment

Technically there can be some "global" information that is not provided to agents but then it becomes partially observed?

makes an interesting point. What is the definition of an agent state? in a JointMDP, do the agents get to make decisions based on S or SV?

@Shushman , @mykelk correct me if I am wrong, but in MultiAgent MDP, all agents have access to the full global state information but the agent state could differ so as to allow say rotations or relative positions or agent identity?

According to DMU, in an MMDP, each agent is able to observe the full true state, i.e., the states of all other agents in the problem.
I don't think there needs to be a separate agent state besides the full global state? It just means that any 'agent' can see the full state vector at that timestep

Hmm. But then the issue for us is that we shouldn't require enumerate on JointMDP state then, because JuliaPOMDP philosophy is to not restrict the types unless required and the full state need not be a vector (there could just be a global view for example).

zsunberg · 2021-01-04T18:25:42Z

According to DMU, in an MMDP, each agent is able to observe the full true state, i.e., the states of all other agents in the problem. I don't think there needs to be a separate agent state besides the full global state? It just means that any 'agent' can see the full state vector at that timestep

Ah, OK, this makes sense to me. Whenever an agent needs to make a decision, it should be given access to the global state, and there is no reason the global state needs to be an AbstractVector.

Then is there a reason to have any concept of individual states?

Perhaps individual states would be useful for the common suboptimal solution of having agents feedback on their own states or a smaller view of the state? If the concept of individual states is needed, I think it would be reasonable to say that states in a JointMDP should support the iteration interface and/or have getindex. Vector and Tuple are types that supports both of these. Or it may be better to have a separate agentview function for this.

Shushman · 2021-01-05T03:25:47Z

src/MAMCTS.jl

-    step = 1
+
+    # TODO: doesn't this add unnecessary action search?
+    r = @gen(:r)(mdp, s, action(policy, s), sim.rng)


Eh under the circumstances it is the least bad option, I think.

I mean, I could do a random action as well?

Shushman · 2021-01-05T03:27:53Z

src/fvmcts/fv_mcts_vanilla.jl

    #@assert typeof(SV) <: AbstractVector
    AV = actiontype(P)
    @assert typeof(A) <: AbstractVector
    @req discount(::P)
    @req isterminal(::P, ::SV)
    @subreq insert_node!(planner.tree, planner, s)
    @subreq estimate_value(planner.solved_estimate, mdp, s, depth)
-    @req gen(::DDNOut{(:sp, :r)}, ::P, ::SV, ::A, ::typeof(planner.rng))
+    @req gen(::P, ::SV, ::A, ::typeof(planner.rng)) # XXX this is not exactly right - it could be satisfied with transition


Not sure I see the issue here. Don't we require a generative model to be implemented?

MCTS.jl has the same thing 😛. @zsunberg might know the exact reason for the comment.

Shushman

Scalar global rewards (and the others) look good to me

rejuvyesh · 2021-01-05T04:23:05Z

Renamed to FactoredValueMCTS.jl everywhere. Will figure out later if FCMCTS is even required.

Import implementation for public release

9a26e63

Co-authored-by: Shushman <[email protected]>

rejuvyesh requested review from Shushman, zsunberg and MaximeBouton October 29, 2020 23:11

Changed Julia compat to 1.4 for now

6a3fc7c

Shushman reviewed Oct 30, 2020

View reviewed changes

src/MAMCTS.jl Outdated Show resolved Hide resolved

Shushman reviewed Oct 30, 2020

View reviewed changes

src/fvmcts/fv_mcts_vanilla.jl Outdated Show resolved Hide resolved

Shushman reviewed Oct 30, 2020

View reviewed changes

rejuvyesh added 2 commits November 2, 2020 17:38

rename Joint -> FV

eb1820e

factor out the FV policy

e2b23f9

Docstring for FVMCTS Solver; cleaned up some bad comments

fd81daa

Shushman reviewed Nov 7, 2020

View reviewed changes

Shushman and others added 2 commits November 6, 2020 17:13

Fleshed out vanilla and varel-mp

65c80eb

Fixes for new api

8c1fc0b

Shushman reviewed Jan 3, 2021

View reviewed changes

rejuvyesh added 4 commits January 4, 2021 03:13

allow scalar global rewards

7ee8e1c

Get everything working

83897fe

add tests

4a91e27

Won't work until other stuff registered first

ugly monkeypatch for RolloutSimulator

6a59626

zsunberg approved these changes Jan 4, 2021

View reviewed changes

small fixes as requested for @req

0ab8bd7

require state space to be iterable

b0249d4

Shushman reviewed Jan 5, 2021

View reviewed changes

Rename project, only have FactoredValueMCTS

c0753db

rejuvyesh added 3 commits January 5, 2021 04:25

fix a few typos

8b7098d

make linter happy

5a6cdf3

Add citation for our paper

c973fb4

rejuvyesh merged commit e6a2ad8 into master Jan 5, 2021

rejuvyesh deleted the impl branch May 6, 2021 00:55

		@@ -0,0 +1,307 @@


		@with_kw mutable struct FCMCTSSolver <: AbstractMCTSSolver

Starting the process of API discussion #1

Starting the process of API discussion #1

Conversation

rejuvyesh commented Oct 29, 2020 • edited Loading

Shushman commented Oct 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rejuvyesh commented Oct 30, 2020

zsunberg commented Oct 31, 2020 • edited Loading

zsunberg commented Oct 31, 2020 • edited Loading

Shushman commented Nov 1, 2020

rejuvyesh commented Nov 1, 2020

zsunberg commented Nov 1, 2020 • edited Loading

Shushman commented Nov 4, 2020

rejuvyesh commented Nov 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shushman Jan 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rejuvyesh Jan 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shushman Jan 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shushman left a comment

Choose a reason for hiding this comment

rejuvyesh commented Jan 4, 2021

rejuvyesh commented Jan 4, 2021

zsunberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rejuvyesh commented Jan 4, 2021

Shushman commented Jan 4, 2021

rejuvyesh commented Jan 4, 2021

zsunberg commented Jan 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shushman left a comment

Choose a reason for hiding this comment

rejuvyesh commented Jan 5, 2021

rejuvyesh commented Oct 29, 2020 •

edited

Loading

zsunberg commented Oct 31, 2020 •

edited

Loading

zsunberg commented Oct 31, 2020 •

edited

Loading

zsunberg commented Nov 1, 2020 •

edited

Loading

Shushman Jan 3, 2021 •

edited

Loading

rejuvyesh Jan 4, 2021 •

edited

Loading

Shushman Jan 3, 2021 •

edited

Loading