Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods for 64-bit access #2

Conversation

j-marjanovic
Copy link
Contributor

This pull request adds two method (_wr64 and _rd64) to allow generating 64-bit transactions (typically) on AXI bus. This can significantly improve performance (in some my measurements of up to 200%) for certain workloads.

@patrislav1
Copy link
Member

patrislav1 commented Feb 8, 2023

Thank you for the suggestion. I made a different approach using templates, which also supports the RegAccessor class (needed for HECTARE-generated register headers).

  • Can you please check if my implementation gives comparable performance to yours?
  • Do you know if 64-bit register access on AXI bus must be 64-bit aligned or is 32-bit enough (i.e. are the alignment checks too strict)?
  • Do you know if the 64-bit access works the same on PCIe (assuming a 64 bit PCIe host CPU o/c)?

@j-marjanovic
Copy link
Contributor Author

It seems that the two approaches fundamentally do two different things. The changes in this MR provides a way to do 64-bit accesses (and 32-bit accesses when that is enough/necessary) on an 32-bit bus; sometimes it is desirable (for performance reasons) to write 2 registers in one transaction. The changes on the bus-width branch on the other hand requires that the user specifies the access size when defining a "driver" and provide a single-width access to the inherited class.

Alignment

Regarding your second point, aarch64 architecture requires that the accesses are aligned, from https://developer.arm.com/documentation/102376/0200/Alignment-and-endianness/Alignment?lang=en

An unaligned access to a Device region will trigger an exception (alignment fault).

This can also be confirmed with devmem:

$ sudo devmem 0xa0490000 64  
0x3B73FABD02010400
$ sudo devmem 0xa0490004 64
[1]    3225 bus error  sudo devmem 0xa0490004 64

@patrislav1
Copy link
Member

I agree that we want to use 32- or 64-bit case-by-case instead of having to use one of them for the whole interface. My previous bus-width approach was misguided.

But some extra code is still needed to make the RegAccessor 64-bit-aware. (We basically don't use _rd32() & friends anymore, but the higher level RegAccessor API with HECTARE-generated maps & structs). Also some extra logic to limit the accesses to 32-bit if talking to a Zynq-7000 over PCIe.

I also added a debug enable flag to disable the raw register logging by default (the ostream stuff always eats some cycles, even when the log level isn't set to trace). It also makes possible to enable raw register logs selectively for certain UIO devices (avoids flooding the shell).

You should be able to do merged register access with the RegAccessor API like this:

RegAccessor<uint32_t, 0x30> foo{this};
RegAccessor<uint32_t, 0x34> bar{this};

struct REG_PACKED_ALIGNED FooBarMerged {
  uint32_t foo;
  uint32_t bar;
};
RegAccessor<FooBarMerged, 0x30> foobarMerged{this};

foobarMerged.wr({
  .foo = 1234,
  .bar = 5678,
});

If you prefer raw memory access, _rd|_wr32() is still there and also has 64-bit counterparts now.
Does this work for you?

@patrislav1 patrislav1 deleted the branch MicroTCA-Tech-Lab:master January 10, 2024 15:17
@patrislav1 patrislav1 closed this Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants