ROOT-5073: Explore changing the on-file byte format to little endian #136

zzxuanyuan · 2016-01-31T12:18:04Z

This branch implements little endian in TBuffer.

The code is not ready to be merged and I hope it would be more convenient to discuss it on github. There is still a design issue. This is the link: https://sft.its.cern.ch/jira/browse/ROOT-5073.

Let's take an example of writing to a TFile, we need to update header (TFile::WriteHeader), streamer info (TFile::WriteStreamerInfo) and free segments (TFile::WriteFree).

TFile::WriteHeader creates a TKey but does not stream its buffer. When you read or write header, it is always stored as big endian.
TFile::WriteFree works in the same way with TFile::WriteHeader.
TFile::WriteStreamerInfo is quite different from above two cases. It creates a TKey but uses streaming function to change streamer info object TList to little endian.

The problem is that all of three information are read by TFile::ReadBuffer or TFile::ReadKeyBuffer without converting the endianesss. header and free segments can be processed without any problems. But streamer info is read in reversed endianess.

To address this issue, one way is adding a fBit in TFile class and change all meta data (header, free segments and streamer info) to little endian. Another way is modifying the read function for streamer info and convert its endianess before read it from buffer.

Zhe

1. Fix tag out of range issue, this is due to TKey calling ReadObjWithBuffer which does not set as Little Endian 2. Fix TTree ReadVersion error, this is because in TBufferFile::ReadVersion we changed the cnt format and it will cause the startoffset of reading version goes back to the cnt offset

…ndian

… kByteCountMask, IsBufBigEndian());

1. In roottest mathcore, some tests call cl->GetMethod which finds the interface of creator of TBufferFile. Since I changed the interface and add two Bool_t arguments, so as should we change the cl->GetMethod here. 2. If we change TBufferFile to LittleEndian, there is a situation could be inappropriately characterized as Having ByteCount and here is the situation happens: For example, if there is no byte count and the first two bytes stores version and the next four bytes store fBufferSize like following: UShort_t + UInt_t ( Version_t + fBufferSize) But in TBufferFile.cxx, it takes a look at (v.cnt & kByteCountMask)(v.cnt is the first four bytes). If it is 0, ROOT determine there is no byte count. Otherwise, the next four bytes should be byte count. If TBufferFile is BigEndian: 00 02 00 00 7d 00 and in this example, version = 2 and fBufferSize = 32000. So v.cnt is 00 00 02 00. However, if we change TBufferFile to LittleEndian, the memory layout becomes: 02 00 00 7d 00 00 and in this case, v.cnt is 7d 00 00 02 and (v.cnt & kByteCountMask != 0) so ROOT thinks the next four bytes are byte count.

zzxuanyuan · 2016-05-05T03:15:09Z

@pcanal @bbockelm
I think this patch is ready. I run through all unit tests. In addition, I switch TBuffer in MainEvent.cxx between little endian and big endian and dump all events in both cases. They look the same.

The lines of code in following link is to determine if there is byte count TBuffer I discussed with Brian this morning. I left some comments above it.
https://github.com/zzxuanyuan/root/blob/byteswap/io/io/src/TBufferFile.cxx#L3237

pcanal · 2016-05-05T15:42:47Z

core/base/inc/Bytes.h

 {
 #ifdef R__BYTESWAP
+if(buffBigEndian) {


coding convention: there should be a space between if and parenthesis.

What is the 'cost' of this added if statement (it is in the inner most part of the processing)?

pcanal · 2016-05-05T16:05:11Z

It is unclear how the code when reading an arbitrary file decides whether the buffer is big endian or little endian. What is the plan there (and/or did I miss some code where it is already implemented)?

1. By convention, there must be a space between if and left parentheses. 2. TKeyXML and TKeySQL derive from TKey, therefore ReadObjWithBuffer() must add two more argument. Additionally, we only add argument type without variable name to avoid compiling error of "arguments are not used".

zzxuanyuan · 2016-05-22T22:18:04Z

@pcanal I have not implemented it yet. I think I could add one more bit in TFile and indicate the endianness of the file. Does that sound like a plan? Otherwise, I might also evaluate the fBufBigEndian bit in TBuffer and determine each buffer is big endian or little endian.

pcanal · 2016-05-25T10:03:41Z

I agree it will probably be sufficient to restrict a full file to be one or the other. We have to find a place in the header on the beginning of the file to store this information.

etejedor · 2018-02-20T12:40:44Z

@zzxuanyuan @pcanal I understand this is still work in progress?

phsft-bot · 2018-02-20T12:40:46Z

Can one of the admins verify this patch?

zzxuanyuan · 2018-02-20T23:26:03Z

@etejedor @pcanal @bbockelm

I think this branch is lack of performance test among different alternatives as discussed in https://sft.its.cern.ch/jira/browse/ROOT-5073.

Do we still want to work on that?

bbockelm · 2018-02-26T02:38:27Z

Hi,

I think we can close for now. Thanks!

Brian

Axel-Naumann assigned pcanal Feb 24, 2016

zzxuanyuan force-pushed the byteswap branch from 24a1147 to dd79ca3 Compare April 19, 2016 15:15

zzxuanyuan added 14 commits April 28, 2016 13:21

Added flag buffBigEndian into frombuf() tobuf()

fa6654d

Change tobuf()/frombuf() functions

0704a6b

Change the BigEndian representations

347bdd9

Only change TBuffer functions and add fBufBigEndian

202db5f

Change {Is,Set}{Endianess} to {Is,Set}Buf{Endianess}

8f9c143

Add debug info and set gROOT to little endian in MainEvent.cxx

37bae27

Added debug messages

7e3ca53

Added fVersion debug

4028c7e

Add debug message in TFile::GetStreamerInfo

ec110fd

Add some debug code and change back the version reading back to big e…

d5888ee

…ndian

Added IsBufBigEndian() to indicate the endianness in tobuf(buf, cnt |…

c664e27

… kByteCountMask, IsBufBigEndian());

Remove debug code

8fd7774

zzxuanyuan force-pushed the byteswap branch from 4f27eb6 to 8fd7774 Compare April 28, 2016 21:59

pcanal reviewed May 5, 2016
View reviewed changes

zzxuanyuan added 4 commits May 22, 2016 13:53

Add explanation of new arguments in constructors of TKey and TBuffer

dd6e812

Use kFALSE and kTRUE for Bool_t type instead of 0 and 1

c02dd1b

Add code explanation which determine if there is byte count in buffer

fa2f71f

bbockelm closed this Feb 26, 2018

ethereal-space-cadet16 mentioned this pull request May 31, 2022

Accessing pyROOT #10676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROOT-5073: Explore changing the on-file byte format to little endian #136

ROOT-5073: Explore changing the on-file byte format to little endian #136

zzxuanyuan commented Jan 31, 2016

zzxuanyuan commented May 5, 2016

pcanal May 5, 2016

pcanal May 5, 2016

pcanal commented May 5, 2016

zzxuanyuan commented May 22, 2016

pcanal commented May 25, 2016

etejedor commented Feb 20, 2018

phsft-bot commented Feb 20, 2018

zzxuanyuan commented Feb 20, 2018

bbockelm commented Feb 26, 2018

ROOT-5073: Explore changing the on-file byte format to little endian #136

ROOT-5073: Explore changing the on-file byte format to little endian #136

Conversation

zzxuanyuan commented Jan 31, 2016

zzxuanyuan commented May 5, 2016

pcanal May 5, 2016

Choose a reason for hiding this comment

pcanal May 5, 2016

Choose a reason for hiding this comment

pcanal commented May 5, 2016

zzxuanyuan commented May 22, 2016

pcanal commented May 25, 2016

etejedor commented Feb 20, 2018

phsft-bot commented Feb 20, 2018

zzxuanyuan commented Feb 20, 2018

bbockelm commented Feb 26, 2018