Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load raw nirx data, due to encoding issue #7313

Closed
benoitvalery opened this issue Feb 13, 2020 · 17 comments · Fixed by #7314
Closed

Can't load raw nirx data, due to encoding issue #7313

benoitvalery opened this issue Feb 13, 2020 · 17 comments · Fixed by #7314

Comments

@benoitvalery
Copy link

benoitvalery commented Feb 13, 2020

When I try to load NirX data with the mne.io.read_raw_nirx function, I'm facing encoding issue with the .hdr file. For now, I'm solving this manually by converting the .hdr to utf8 with a simple text file editor like geany.

MWE

I recorded a test dataset (in a test_github folder), which is composed of the following files :

./NIRS-2020-02-13_001.evt  --  inode/x-empty; charset=binary
./NIRS-2020-02-13_001.set  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.wl1  --  text/plain; charset=us-ascii
./Standard_probeInfo.mat  --  application/octet-stream; charset=binary
./NIRS-2020-02-13_001.dat  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.tpl  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001_config.txt  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.wl2  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.hdr  --  application/x-wine-extension-ini; charset=iso-8859-1
./NIRS-2020-02-13_001.inf  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.avg  --  application/octet-stream; charset=binary

These informations were obtained with the following command (linux) :

for f in `find | egrep -v Eliminate`; do echo "$f" ' -- ' `file -bi "$f"` ; done

On python side, here is the code that I intend to use to load the data. I'm using the last version of MNE (0.20.dev0).

#!/usr/bin/env python3
import os
import mne

path = os.sep.join([os.getcwd(), 'test_github'])
raw_intensity = mne.io.read_raw_nirx(path, verbose=True).load_data()

As mentioned here, I should obtain this kind of output :

Loading /home/circleci/mne_data/MNE-fNIRS-motor-data/Participant-1
Reading 0 ... 23238  =      0.000 ...  2974.464 secs...

But actually, the read_raw_nirx command raises the following Traceback :

Loading /home/bvaler01/Documents/programmes/NBack/test_github
Traceback (most recent call last):
  File "processing_github.py", line 6, in <module>
    raw_intensity = mne.io.read_raw_nirx(path, verbose=True).load_data()
  File "/home/bvaler01/.local/lib/python3.7/site-packages/mne/io/nirx/nirx.py", line 39, in read_raw_nirx
    return RawNIRX(fname, preload, verbose)
  File "</home/bvaler01/.local/lib/python3.7/site-packages/mne/externals/decorator.py:decorator-gen-198>", line 2, in __init__
  File "/home/bvaler01/.local/lib/python3.7/site-packages/mne/utils/_logging.py", line 89, in wrapper
    return function(*args, **kwargs)
  File "/home/bvaler01/.local/lib/python3.7/site-packages/mne/io/nirx/nirx.py", line 113, in __init__
    hdr_str = f.read()
  File "/usr/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 62: invalid continuation byte

The test dataset is available here (.zip). It is a ten-seconds recording, from NirStar 15.2. It should contain noise only (headless record).

@cbrnr
Copy link
Contributor

cbrnr commented Feb 13, 2020

Do the .hdr files have a fixed encoding (i.e. are these files always encoded using iso-8859-1)? If so, open in https://github.com/mne-tools/mne-python/blob/master/mne/io/nirx/nirx.py#L112 should use the encoding='latin1' argument.

@benoitvalery
Copy link
Author

All I can say is that I could not found any way to change the encoding of the .hdr file in the NirStar preferences. But it should be confirmed by other users. The NirsStar documentation does not say anything about it.

@cbrnr
Copy link
Contributor

cbrnr commented Feb 13, 2020

Yeah, that's not unexpected. On which platform do you record? Is it a Windows machine? Are other platforms (macOS, Linux) also supported by the recording software? I could imagine that they just use the (default) platform encoding, which is UTF8 on macOS and Linux, and Latin-1 on Windows. If their encoding differs from platform to platform (or even depends on the actual locale), then it won't be easily possible to find a simple solution. One possible option is to make an educated guess with https://pypi.org/project/chardet/ to choose a suitable encoding when opening the file.

@benoitvalery
Copy link
Author

I did the recording on a Windows platform. It seems that there is no alternative (MacOS, linux) NirStar distribution. What seems strange to me, is that only the .hdr is encoded in latin1. What about the others: they have been created from a windows platform too (?).

@cbrnr
Copy link
Contributor

cbrnr commented Feb 13, 2020

The other text files are ASCII-encoded, which is a subset of Latin-1. This just means that they don't contain any special characters because the first 128 character encodings are identical in ASCII and Latin-1.

@benoitvalery
Copy link
Author

benoitvalery commented Feb 13, 2020

So we have to ensure what format encoding is used by NirStar 15.2. Other users inputs would be great. If format encoding is varying from one version to another (15.0 vs 15.2, or windows 7 vs windows 10 ?), then, only a chardet solution would be reasonable. Am I wrong ?

@cbrnr
Copy link
Contributor

cbrnr commented Feb 13, 2020

The safest solution would be to use chardet to infer the encoding. However, we have been simply assuming latin-1 in other functions, so we could as well do it here. It will fix your problem, and will likely work for most other cases. If someone encounters more problems, we can always choose the safer solution then.

@larsoner
Copy link
Member

Agreed let's just go with latin-1 and if it turns out to be problematic we'll do something smarter

@rob-luke
Copy link
Member

Thanks for fixing this so quickly @larsoner

I checked files from multiple nirx machines that I have access to and they all have the same encoding as the test set I uploaded. I also can’t see any settings in nirstar that would change the encoding.

I don’t know much about windows, but could the locale change the encoding? I am based in Australia, where are you @benoitvalery . It seems this is sorted now, but I’m just curious as to what might have caused this issue.

@cbrnr
Copy link
Contributor

cbrnr commented Feb 14, 2020

The locale almost certainly affects the encoding. If you don't specify an encoding, the system default will be used. This is CP-1252 (~ Latin-1) for many western languages, but something else on a lot of other systems. Therefore, if a Windows user in e.g. Russia records nirx files, these will likely be encoded as CP-1251 (assuming that's the Windows default). We might get away if only characters with identical encodings are used, or decode the wrong characters. If this really happens, we can add an encoding parameter to read_raw_nirx (defaulting to latin-1).

@benoitvalery
Copy link
Author

Hi, it seems that the parameter solution proposed by @cbrnr is the most generic. @rob-luke I'm based in France.

@larsoner
Copy link
Member

Sounds reasonable but let's wait until we have suitable test files to work on this

@benoitvalery
Copy link
Author

I tested another older (recorded 8 monthes ago) dataset this morning and all the files, including the .hdr file, were reported as ascii by my linux, without any pre-manipulation on it.

./Motor-2019-07-19_001.dat  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001_config.txt  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001_probeInfo.mat  --  application/octet-stream; charset=binary
./Motor-2019-07-19_001.avg  --  application/octet-stream; charset=binary
./Motor-2019-07-19_001.tpl  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.evt  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.set  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.hdr  --  application/x-wine-extension-ini; charset=us-ascii
./Motor-2019-07-19_001.wl1  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.nirs  --  application/octet-stream; charset=binary
./Motor-2019-07-19_001.inf  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.wl2  --  text/plain; charset=us-ascii

How is this possible ?

@cbrnr
Copy link
Contributor

cbrnr commented Feb 14, 2020

This is possible because maybe you didn't include any special non-ASCII characters in the header, then Latin-1 is identical to ASCII.

@rob-luke
Copy link
Member

Sorry to reopen this but @benoitvalery I am trying to fix date reading for French files over in #7891

But I can't download the small file you linked above #7313 (comment). Are you able to reupload this small file somewhere so I can grab it and ensure I don't break the support we added here. Thanks

@benoitvalery
Copy link
Author

benoitvalery commented Sep 21, 2020

Hi @rob-luke, sorry for the delay, here are the files you asked for !

@rob-luke
Copy link
Member

Fantastic! Thanks so much @benoitvalery

FYI: I don't think we have fully fixed the French date coding, so there is an issue here if you have any more feedback #8219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants