Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROOT-5073: Explore changing the on-file byte format to little endian #136

Closed
wants to merge 18 commits into from

Conversation

zzxuanyuan
Copy link
Contributor

Hi, @pcanal @bbockelm

This branch implements little endian in TBuffer.

The code is not ready to be merged and I hope it would be more convenient to discuss it on github. There is still a design issue. This is the link: https://sft.its.cern.ch/jira/browse/ROOT-5073.

Let's take an example of writing to a TFile, we need to update header (TFile::WriteHeader), streamer info (TFile::WriteStreamerInfo) and free segments (TFile::WriteFree).

  1. TFile::WriteHeader creates a TKey but does not stream its buffer. When you read or write header, it is always stored as big endian.
  2. TFile::WriteFree works in the same way with TFile::WriteHeader.
  3. TFile::WriteStreamerInfo is quite different from above two cases. It creates a TKey but uses streaming function to change streamer info object TList to little endian.

The problem is that all of three information are read by TFile::ReadBuffer or TFile::ReadKeyBuffer without converting the endianesss. header and free segments can be processed without any problems. But streamer info is read in reversed endianess.

To address this issue, one way is adding a fBit in TFile class and change all meta data (header, free segments and streamer info) to little endian. Another way is modifying the read function for streamer info and convert its endianess before read it from buffer.

Zhe

1. Fix tag out of range issue, this is due to TKey calling ReadObjWithBuffer which does not set as Little Endian
2. Fix TTree ReadVersion error, this is because in TBufferFile::ReadVersion we changed the cnt format and it will cause the startoffset of reading version goes back to the cnt offset
1. In roottest mathcore, some tests call cl->GetMethod which finds the interface of creator of TBufferFile. Since I changed the interface and add two Bool_t arguments, so as should we change the cl->GetMethod here.
2. If we change TBufferFile to LittleEndian, there is a situation could be inappropriately characterized as Having ByteCount and here is the situation happens:

For example, if there is no byte count and the first two bytes stores version and the next four bytes store fBufferSize like following:
	UShort_t + UInt_t ( Version_t + fBufferSize)
But in TBufferFile.cxx, it takes a look at (v.cnt & kByteCountMask)(v.cnt is the first four bytes). If it is 0, ROOT determine there is no byte count. Otherwise, the next four bytes should be byte count. If TBufferFile is BigEndian:
	00 02 00 00 7d 00
and in this example, version = 2 and fBufferSize = 32000. So v.cnt is 00 00 02 00. However, if we change TBufferFile to LittleEndian, the memory layout becomes:
	02 00 00 7d 00 00
and in this case, v.cnt is 7d 00 00 02 and (v.cnt & kByteCountMask != 0) so ROOT thinks the next four bytes are byte count.
@zzxuanyuan
Copy link
Contributor Author

@pcanal @bbockelm
I think this patch is ready. I run through all unit tests. In addition, I switch TBuffer in MainEvent.cxx between little endian and big endian and dump all events in both cases. They look the same.

The lines of code in following link is to determine if there is byte count TBuffer I discussed with Brian this morning. I left some comments above it.
https://github.com/zzxuanyuan/root/blob/byteswap/io/io/src/TBufferFile.cxx#L3237

{
#ifdef R__BYTESWAP
if(buffBigEndian) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding convention: there should be a space between if and parenthesis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the 'cost' of this added if statement (it is in the inner most part of the processing)?

@pcanal
Copy link
Member

pcanal commented May 5, 2016

It is unclear how the code when reading an arbitrary file decides whether the buffer is big endian or little endian. What is the plan there (and/or did I miss some code where it is already implemented)?

1. By convention, there must be a space between if and left parentheses.
2. TKeyXML and TKeySQL derive from TKey, therefore ReadObjWithBuffer() must add two more argument.
   Additionally, we only add argument type without variable name to avoid compiling error of "arguments are not used".
@zzxuanyuan
Copy link
Contributor Author

@pcanal I have not implemented it yet. I think I could add one more bit in TFile and indicate the endianness of the file. Does that sound like a plan? Otherwise, I might also evaluate the fBufBigEndian bit in TBuffer and determine each buffer is big endian or little endian.

@pcanal
Copy link
Member

pcanal commented May 25, 2016

I agree it will probably be sufficient to restrict a full file to be one or the other. We have to find a place in the header on the beginning of the file to store this information.

@etejedor
Copy link
Contributor

@zzxuanyuan @pcanal I understand this is still work in progress?

@phsft-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@zzxuanyuan
Copy link
Contributor Author

@etejedor @pcanal @bbockelm

I think this branch is lack of performance test among different alternatives as discussed in https://sft.its.cern.ch/jira/browse/ROOT-5073.

Do we still want to work on that?

@bbockelm
Copy link
Contributor

Hi,

I think we can close for now. Thanks!

Brian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants