Read in VT object file and migrate objects accordingly #431

lifflander · 2019-08-27T16:41:15Z

Currently, VT can only write an object map file. See ProcStats::outputStatsFile in src/vt/vrt/collection/balance/proc_stats.cc for the implementation of writing out the map. We need to read in the map (without communication edges) and then migrate objects based on that mapping. A new load balancer may need to be created to do this.

The text was updated successfully, but these errors were encountered:

lifflander · 2019-08-27T16:41:25Z

@mperrinel @ppebay

mperrinel · 2019-08-29T16:04:10Z

I have produced some object map file using the ProcStats::outputStatsFile and the examples/lb_iter program :
mpirun --use-hwthread-cpus -n 4 examples/lb_iter --vt_lb_stats --vt_lb_stats_dir=statsoutput --vt_lb_stats_file=statgreedy

I don't know if it normal, but each call produces different stats.
I had to add the enable_LB option in Cmake.

I think the Python LoadReader needs a file named like that : base-name.node.vom,
The VT writer generates base-name.node.out
I can change in the writer : out to vom

I have created a simple program which is able to read (using fscanf) the Load part (not the com) of these files.
Where can I put it in VT ? Does the src/vt/vrt/collection/balance/proc_stats.cc is a good place for that ?

Do I have to create a new feature based on develop or based on the #427-lb-base-class ?
If I create a new method on the proc_stats.cc to Read the stats, I don't know when to call it and how to use its result. I will work on this last point tomorrow.

Why a new load balancer may need to be created ?

lifflander · 2019-08-29T16:37:36Z

@mperrinel I just have merged #427 on develop, so please branch directly off of develop now.

Yes, src/vt/vrt/collection/balance/proc_stats.cc is a good place for it.
Can you create a new static method static <some-return-type> inputStatsFile(std::string filename); that reads the file?

lifflander · 2019-08-29T16:39:28Z

Regarding the new load balancer:

After the file is read we need to put it in a data structure that the system can use. It could have the type: std::vector<std::unordered_map<ElementIDType,TimeType>> like ProcStats::proc_data_.
With this data in hand, we need to actually migrate the objects based on the file. That will require a load balancer that uses the data that was read to enact those changes.

Would it be useful to have a Skype voice meeting to discuss exactly how the new load balancer should act?

mperrinel · 2019-08-29T18:21:00Z

I think I can do alone the first point. For the second point, I need more explanation ! So ok for the Skype voice meeting.
Today I can at Midday your time, 15 minutes can be enough ?. I'm currently in another meeting and later I won't be free.
I'm free tomorrow if today is too hard.

mperrinel · 2019-08-30T14:14:51Z

I have created the new method on the proc_stats.cc to Read the stats.
It's only partial now because it doesn't read the communication and doesn't loop on the num_iters variable (which correspond to the proc_data_.size()).
A new proc_data_in_ member variable contains the Load values.

lifflander · 2019-08-30T17:52:38Z

@mperrinel Does 12:30pm PDT work for you?

mperrinel · 2019-08-30T19:32:10Z

Yes

The input statistic method is now called during the runtime initialize method

mperrinel · 2019-09-01T17:01:15Z

Input VOM statistic can now be used by using the new --vt_lb_stats_dir_in and --vt_lb_stats_file_in arguments.
To finish this feature :

Fill the proc_data_ data member directly instead of the new proc_data_in_ (which will be removed)
_ Use the num_iters variable to finish the reader
Block the regular filling of the proc_data_ if the input VOM file is used
Create a new LB which aims to do the object migration using the proc_data_

ppebay · 2019-09-01T17:25:13Z

👍🏻 +1

…

On 1 Sep 2019, at 19:01, mperrinel ***@***.***> wrote: Input VOM statistic can now be used by using the new --vt_lb_stats_dir_in and --vt_lb_stats_file_in arguments. To finish this feature : Fill the proc_data_ data member directly instead of the new proc_data_in_ (which will be removed) _ Use the num_iters variable to finish the reader Block the regular filling of the proc_data_ if the input VOM file is used Create a new LB which aims to do the object migration using the proc_data_ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

proc_data is now used instead of proc_data_in.

mperrinel · 2019-09-02T16:07:35Z

proc_data_ data member is directly used instead of proc_data_in_
the regular filling of the proc_data_ is blocked if the input VOM file is used

To finish this feature :

Remove proc_data_in_
Add a new map in the proc_data_ variable for every different value in the first column. (Reader to update)
Create a new LB which aims to do the object migration using the proc_data_

lifflander · 2019-09-10T06:32:42Z

More detailed commends based on our discussion:

The system currently sets proc_data_ for each phase as the program runs. proc_data_ contains instrumentation which is valuable to the runtime even when following a user-specified map. It's the data structure that records how long each object actually took, which should not be modified based on a user mapping. It's a derived value and we need it to not change with the file so the statistics run properly.

We do need a new data structure for a user-specified map. It can go in ProcStats, but we could call it something else, maybe user_specified_map.

After the map is populated, the data from the map needs to be imported into the LB. Instead of messing with startLBHandler, we should just read it in runLB in the new load balancer. BaseLB::phase_ will tell you which phase you are on so you can index the first dimension of the user_specified_map.

Next, after reading, we should do reductions to determine if any object has moved for a given phase. Each processor can do this by locally checking if the user_specified_map[i] is different than user_specified_map[i+1]. We can create a std::vector<bool> locally and then boolean or reduce that vector across the whole machine into a new variable std::vector<bool> user_specified_map_changed;. Then, in LBManager::decideLBToRun, if the load balancer is from a file user-specified, we should index that vector to determine if it needs to run.

The remainder is just calling migrateObjectTo(ProcStats::proc_perm_to_temp_[obj_id], this_node)

Use the new user_specified_map_changed_ data member instead

mperrinel · 2019-09-10T15:48:50Z

Thank @lifflander for the details !

So I updated the proc_data_ stats to keep its old behaviour. In addition the user_specified_map_ variable member store the data coming from the input file.
In the new StatsMapLB class, I added a new variable of type std::vector that is filled using the difference between the user_specified_map[I+1]_ and the user_specified_map[I]_. the size of this vector is user_specified_map.size() - 1. If there is at least one difference between the two phases, then the correspondant vector value is set to true. By doing that, I don't really have a boolean value for every load of a phase but a unique boolean value for all the Loads.
I didn't find the LBManager::decideLBToRun method but in the runLB method, I do the migration for a phase only if the correspondant std::vector is true.
I need more information about std::vector reduction.
Let's talk about that at the meeting.

lifflander · 2019-09-11T16:23:40Z

@mperrinel Here's a snippet of how you might do a reduce:

namespace vt { namespace collective { namespace reduce { namespace operators {

template <typename T>
struct OrOp<std::vector<T>> {
  void operator()(std::vector<T>& v1, std::vector<T> const& v2) {
    vtAssert(v1.size() == v2.size(), "Sizes of vectors in reduce must be equal");
    for (size_t ii = 0; ii < v1.size(); ++ii)
      v1[ii] = v1[ii] or v2[ii];
  }
};

}}}} /* end namespace vt::collective::reduce::operators */


struct StatsMapLB {
  using ReduceMsgType = collective::ReduceVecMsg<bool>;
  void doneReduce(ReduceMsgType* msg) {

  }

  void doReduce() {
    auto cb = theCB()->makeBcast<StatsMapLB,ReduceMsgType,&LBManager::doneReduce>(proxy);
    auto msg = makeMessage<MsgType>(in_vector);
    proxy.reduce<OrOp<std::vector<int>>>(msg.get(),cb);
  }

private:
  objgroup::proxy::Proxy<RotateLB> proxy = {};
};

This might not compile, but that's an example of what you need to write.

mperrinel · 2019-09-12T12:27:23Z

Hi @lifflander,

I try to understand your details:

I guess the reduction will put in a global vector the reduction done using all local vector (1 by node I mean). So for 4 nodes, I will put in a 5th global vector the reduction of all the 4th vectors. And because it is a vector of bool, the good value for the global vector is the result of the or operator between the global (initialised with false values) and each local. That means for a phase, if at least one node vector contains a true value, the correspondant global value (with the same phase) will also be true .
I put the OrOp<std::vector> into the .../vt/collective/reduce/operators/functors/or_op.h
I put the doReduce inside the StatsMapLB class (the LB where I have already the runLB method)
I don't know where to put the doneReduce method. In your exemple, you wrote it in the StatsMapLB but you also use it in the doReduce as it come from the LBManager. So I have put it in the invoke.h file, inside the InvokeLB Struct class. I'm really not sure about that.
theCB()->makeBcast<StatsMapLB,ReduceMsgType,&LBManager::doneReduce> call doesn't compile. I think it's because the doneReduce method should be implemented into a struct that inherits from : vt::Collection<T1, T2> which is not the case for invokeLB class. The correspondant makeBcast template is :

  template <typename ObjT, typename MsgT, ObjMemType<ObjT,MsgT> f>
  Callback<MsgT> makeBcast(objgroup::proxy::Proxy<ObjT> proxy);

As we can see the LBManager::doneReduce have not the ObjMemType<ObjT,MsgT> form.

How to combine all of these new methods ?

In the runLB method, I can just call the doReduce method
Then in the doneReduce method, I can get the global vector from the ReduceMsgType that should contain it. With it, I can check for the phase_ attribute of the StatsMapLB if the global vector.at(phase_) return true and in this case, call the migration.

Thanks for your help. We can discuss about that when you want !

The input statistic method is now called during the runtime initialize method

proc_data is now used instead of proc_data_in.

Use the new user_specified_map_changed_ data member instead

…cified by input file.

…viewer's request.

…put files in ProcStats. Clean up variable names.

lifflander assigned mperrinel Aug 27, 2019

mperrinel added a commit that referenced this issue Aug 30, 2019

#431: lb: Add partial reader for VT object map files

cbb18cc

mperrinel added a commit that referenced this issue Sep 1, 2019

#431: lb: Input statistic file is now provided by argument

7f41c8c

The input statistic method is now called during the runtime initialize method

mperrinel added a commit that referenced this issue Sep 2, 2019

#431: lb: Block the regular filling of proc_data if input stat

63ea414

proc_data is now used instead of proc_data_in.

mperrinel added a commit that referenced this issue Sep 9, 2019

#431: lb: Fix bug in the lb stat reader

8d6282f

mperrinel added a commit that referenced this issue Sep 10, 2019

#431: lb: Gives the proc_data_ variable its old behavior

2a3eba9

Use the new user_specified_map_changed_ data member instead

mperrinel added a commit that referenced this issue Sep 10, 2019

#431: lb: Add new load balancer for input stats

dfcb3c0

mperrinel added a commit that referenced this issue Sep 12, 2019

#431: lb: first try on vector reduction

c7002b7

mperrinel added a commit that referenced this issue Sep 12, 2019

#431: lb: The reader create a new entry in the vector for every phase

e9a88ac

mperrinel added a commit that referenced this issue Sep 12, 2019

#431: lb: Add partial reader for VT object map files

b2fa0ec

mperrinel added a commit that referenced this issue Sep 12, 2019

#431: lb: Input statistic file is now provided by argument

f3b21aa

The input statistic method is now called during the runtime initialize method

mperrinel added a commit that referenced this issue Sep 12, 2019

#431: lb: Block the regular filling of proc_data if input stat

a85167b

proc_data is now used instead of proc_data_in.

mperrinel added a commit that referenced this issue Sep 12, 2019

#431: lb: Fix bug in the lb stat reader

063a867

mperrinel added a commit that referenced this issue Sep 12, 2019

#431: lb: Gives the proc_data_ variable its old behavior

8ab11d4

Use the new user_specified_map_changed_ data member instead

mperrinel added a commit that referenced this issue Sep 12, 2019

#431: lb: Add new load balancer for input stats

6b11670

lifflander pushed a commit that referenced this issue Jan 28, 2020

#431 Add new test. Minor clean up of interface for 'ProcStats'

cca532f

lifflander pushed a commit that referenced this issue Jan 28, 2020

#431 - Fix error in test.

f1d4fce

lifflander pushed a commit that referenced this issue Jan 28, 2020

#431 Add 'spin' loop to complete communications.

f152bdd

lifflander pushed a commit that referenced this issue Jan 28, 2020

#431 Fix bugs in the test.

3e06f82

lifflander added a commit that referenced this issue Jan 28, 2020

#431: lb: move init into another method

788600e

lifflander added a commit that referenced this issue Jan 28, 2020

#431: lb: small cleanup

737fc03

lifflander added a commit that referenced this issue Jan 28, 2020

#431: lb: some reformatting of stats code

ea9ed44

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Overload 'or' reduction operator.

a4c954c

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Add new flag for input LB statistics file.

f797578

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Add new group information.

56bc4bd

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Add initialization of static variables.

2662253

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Add new strcture for a 'dummy' load balancer when the map is spe…

e3cdde2

…cified by input file.

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Address 'Codacy' issue for reading size_t

8d63e98

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Fix bug to build the migration information.

49c150a

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Minor clean-ups.

dc67975

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Fix name to match coding guideline

286a794

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Propagate change in variable name

4170470

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Move static information into 'ProcStats'. Minor clean-ups per re…

945ed62

…viewer's request.

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Reduce the number of static variables. Move tools for reading in…

6bcb693

…put files in ProcStats. Clean up variable names.

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Remove macro IF.

5fe1089

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Move definition of filename for testing.

a901203

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Add new test. Minor clean up of interface for 'ProcStats'

cd6e45b

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 - Fix error in test.

1e30e9c

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Add 'spin' loop to complete communications.

086d8b8

lifflander pushed a commit that referenced this issue Feb 5, 2020

#431 Fix bugs in the test.

b2b4b54

lifflander added a commit that referenced this issue Feb 5, 2020

#431: lb: move init into another method

1a2fcff

lifflander added a commit that referenced this issue Feb 5, 2020

#431: lb: small cleanup

440e955

lifflander added a commit that referenced this issue Feb 5, 2020

#431: lb: some reformatting of stats code

d614bcc

lifflander closed this as completed May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read in VT object file and migrate objects accordingly #431

Read in VT object file and migrate objects accordingly #431

lifflander commented Aug 27, 2019

lifflander commented Aug 27, 2019

mperrinel commented Aug 29, 2019

lifflander commented Aug 29, 2019

lifflander commented Aug 29, 2019

mperrinel commented Aug 29, 2019

mperrinel commented Aug 30, 2019

lifflander commented Aug 30, 2019

mperrinel commented Aug 30, 2019

mperrinel commented Sep 1, 2019

ppebay commented Sep 1, 2019 via email

mperrinel commented Sep 2, 2019

lifflander commented Sep 10, 2019 •

edited

Loading

mperrinel commented Sep 10, 2019

lifflander commented Sep 11, 2019

mperrinel commented Sep 12, 2019 •

edited

Loading

Read in VT object file and migrate objects accordingly #431

Read in VT object file and migrate objects accordingly #431

Comments

lifflander commented Aug 27, 2019

lifflander commented Aug 27, 2019

mperrinel commented Aug 29, 2019

lifflander commented Aug 29, 2019

lifflander commented Aug 29, 2019

mperrinel commented Aug 29, 2019

mperrinel commented Aug 30, 2019

lifflander commented Aug 30, 2019

mperrinel commented Aug 30, 2019

mperrinel commented Sep 1, 2019

ppebay commented Sep 1, 2019 via email

mperrinel commented Sep 2, 2019

lifflander commented Sep 10, 2019 • edited Loading

mperrinel commented Sep 10, 2019

lifflander commented Sep 11, 2019

mperrinel commented Sep 12, 2019 • edited Loading

lifflander commented Sep 10, 2019 •

edited

Loading

mperrinel commented Sep 12, 2019 •

edited

Loading