-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU-like Memory Hierarchy using Merlin #1161
Comments
Same error with smaller configuration |
@mkhairy There are some port mismatches in the configuration. Some ports only work over a network, some will not work over a network. In the configuration, there are network ports connecting to non-network ports and non-network ports connecting to merlin. It looks like what you're trying to do on the GPU cache hierarchy is: |
Yes, that is right as you described and L1 caches are non-coherent and so there is no directory on the L2 cache side. |
Could you please share with me the right way to connect the components into Merlin, to describe the above model? |
This pdf attachment contains a photo of the model. I also attached the original python script. |
Thanks! We've never tested "network <-> L2 <-> memory" so there's a small change to the cache constructor that will be needed. I'll push it soon (today/tomorrow) and let you know, along with the right ports and memNIC parameters to set. Essentially you'll have L1 connect to the GPU via the "high_network_0" port and connect to the network via the "cache" port. Then you'll have L2 connect to the network via the "cache" port and to the memory via the "low_network_0" port. The memory will stay on the "direct_link" port. Then there will be a few memNIC parameters to set since you don't have a directory in that path. |
Thanks a lot! |
The fix is getting merged right now (the branch is "mh_ports" if you want to grab it in the meantime). For the GPU L1s, connect them to the GPU via "high_network_0" and to the network via "cache". Then add the following parameters to the L1s: For the GPU L2s, connect them to the network via "cache" and to the memories via "low_network_0". Since you have one L2 per memory, is the L2 for each memory the "home" for that memory's addresses? If so, you'll need to set each L2 memNIC's address ranges so that the L1s can route requests correctly. As long as each memory is getting a contiguous block of addresses (so mem0 gets 0-X, mem1 gets X-Y, etc.), you shouldn't need to touch the cache "slice" parameters. Just add the region parameters to the L2s: Hope this solves the problem! If it does, go head and close the issue. Also, if you did want to interleave addresses across memories instead of do contiguous chunks, that is also possible but will probably require another change to the cache slice addressing policy. |
Thanks! It is working! and simulation is running successfully. [tgrogers-pc02:32089] Signal: Floating point exception (8) |
Hi, Could you please help me with the error above? What is the right way to do interleaving? |
@mkhairy You need to set the interleave_step size. In your case, this should be (numL2s * interleave_size). |
@mkhairy Also there will be an issue with cache sets because the set mapping policy is currently hard-coded, so some sets may not get valid addresses mapped to them. I'm fixing this - I'll let you know when it's ready. Just to clarify, the interleaving across L2s matches the interleaving across memories right? Make sure to set "memNIC." address range/interleave params for the L2s and "cpulink." (same parameter names, different prefix) for the memories so that they match. I'll push the fix to make sure the cache array picks up the interleaving and maps addresses into sets correctly. |
@hughes-c thanks for your help, it's working successfully. |
That's the memory controller's link manager that faces the L2 (towards the 'cpu'). The memory and the L2 should have the same parameters on their link managers. So set on the memory controllers: |
Hi @gvoskuilen , so to make it clear. I have the L2 cache interleaving parameters look like this: and I set the Memory controller interleaving parameters to have the same parameters like above. Now, I connect the memory controller and L2 using a link as shown below. link = sst.Link("l2g_mem_link_%d"%next_mem_id) So, you want me to update this link config to have the same interleaving parameters as L2 and MC, as shown below. Is that correct?
|
@mkhairy I didn't realize you'd set the memory parameters for address range/interleaving. As long as you either set those or set the same parameters prefixed with 'cpulink' in the memory controller, you'll get the correct behavior. The cache set address fix is located in a branch called 'mh_cache_interleave' and is currently going through the process of getting merged into the devel branch. |
Hello,
I am trying to use Merlin to model non-coherent cache hierarchy, similar to Nvidia-like GPU cache hierarchy these days.
http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
So, I connect a processor-side non-coherent L1 cache to Merlin crossbar router at one side and directory-less memory-side L2 caches on the other side.
However, I came across the error listed below. It seems like Merlin has to be connected with a directory interface. Is that true? Does Merlin need a coherence protocol to work?
I attached a dumped python file that tries to model the GPU-like memory hierarchy.
python file.zip
#0 0x00000000004caa9a in SST::Units::operator== (this=this@entry=0x20515f40,
lhs=...) at unitAlgebra.cc:278
#1 0x00000000004cce82 in SST::Units::operator!= (lhs=..., this=0x20515f40)
at unitAlgebra.h:90
#2 SST::UnitAlgebra::operator> (this=this@entry=0x20515f38, v=...)
at unitAlgebra.cc:480
#3 0x00007fffe8f3ffb5 in SST::Merlin::LinkControl::init (this=0x20515ef0,
phase=) at linkControl.cc:161
#4 0x00007fffe619d7b8 in SST::MemHierarchy::MemNIC::init (this=0x20514de0,
phase=1) at memNIC.cc:224
#5 0x00007fffe6135f8c in SST::MemHierarchy::Cache::init (this=0x203c37f0,
phase=1) at cacheEventProcessing.cc:456
#6 0x00000000004ba874 in SST::Simulation::initialize (this=0xb85c00)
at simulation.cc:430
#7 0x000000000046382a in start_simulation (tid=tid@entry=0, info=...,
barrier=...) at main.cc:321
#8 0x0000000000456de3 in main (argc=3, argv=0x7fffffffe278) at main.cc:679
The text was updated successfully, but these errors were encountered: