Can't load raw nirx data, due to encoding issue #7313

benoitvalery · 2020-02-13T09:30:48Z

When I try to load NirX data with the mne.io.read_raw_nirx function, I'm facing encoding issue with the .hdr file. For now, I'm solving this manually by converting the .hdr to utf8 with a simple text file editor like geany.

MWE

I recorded a test dataset (in a test_github folder), which is composed of the following files :

./NIRS-2020-02-13_001.evt  --  inode/x-empty; charset=binary
./NIRS-2020-02-13_001.set  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.wl1  --  text/plain; charset=us-ascii
./Standard_probeInfo.mat  --  application/octet-stream; charset=binary
./NIRS-2020-02-13_001.dat  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.tpl  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001_config.txt  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.wl2  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.hdr  --  application/x-wine-extension-ini; charset=iso-8859-1
./NIRS-2020-02-13_001.inf  --  text/plain; charset=us-ascii
./NIRS-2020-02-13_001.avg  --  application/octet-stream; charset=binary

These informations were obtained with the following command (linux) :

for f in `find | egrep -v Eliminate`; do echo "$f" ' -- ' `file -bi "$f"` ; done

On python side, here is the code that I intend to use to load the data. I'm using the last version of MNE (0.20.dev0).

#!/usr/bin/env python3
import os
import mne

path = os.sep.join([os.getcwd(), 'test_github'])
raw_intensity = mne.io.read_raw_nirx(path, verbose=True).load_data()

As mentioned here, I should obtain this kind of output :

Loading /home/circleci/mne_data/MNE-fNIRS-motor-data/Participant-1
Reading 0 ... 23238  =      0.000 ...  2974.464 secs...

But actually, the read_raw_nirx command raises the following Traceback :

Loading /home/bvaler01/Documents/programmes/NBack/test_github
Traceback (most recent call last):
  File "processing_github.py", line 6, in <module>
    raw_intensity = mne.io.read_raw_nirx(path, verbose=True).load_data()
  File "/home/bvaler01/.local/lib/python3.7/site-packages/mne/io/nirx/nirx.py", line 39, in read_raw_nirx
    return RawNIRX(fname, preload, verbose)
  File "</home/bvaler01/.local/lib/python3.7/site-packages/mne/externals/decorator.py:decorator-gen-198>", line 2, in __init__
  File "/home/bvaler01/.local/lib/python3.7/site-packages/mne/utils/_logging.py", line 89, in wrapper
    return function(*args, **kwargs)
  File "/home/bvaler01/.local/lib/python3.7/site-packages/mne/io/nirx/nirx.py", line 113, in __init__
    hdr_str = f.read()
  File "/usr/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 62: invalid continuation byte

The test dataset is available here (.zip). It is a ten-seconds recording, from NirStar 15.2. It should contain noise only (headless record).

The text was updated successfully, but these errors were encountered:

cbrnr · 2020-02-13T09:40:59Z

Do the .hdr files have a fixed encoding (i.e. are these files always encoded using iso-8859-1)? If so, open in https://github.com/mne-tools/mne-python/blob/master/mne/io/nirx/nirx.py#L112 should use the encoding='latin1' argument.

benoitvalery · 2020-02-13T10:44:18Z

All I can say is that I could not found any way to change the encoding of the .hdr file in the NirStar preferences. But it should be confirmed by other users. The NirsStar documentation does not say anything about it.

cbrnr · 2020-02-13T12:10:40Z

Yeah, that's not unexpected. On which platform do you record? Is it a Windows machine? Are other platforms (macOS, Linux) also supported by the recording software? I could imagine that they just use the (default) platform encoding, which is UTF8 on macOS and Linux, and Latin-1 on Windows. If their encoding differs from platform to platform (or even depends on the actual locale), then it won't be easily possible to find a simple solution. One possible option is to make an educated guess with https://pypi.org/project/chardet/ to choose a suitable encoding when opening the file.

benoitvalery · 2020-02-13T12:50:33Z

I did the recording on a Windows platform. It seems that there is no alternative (MacOS, linux) NirStar distribution. What seems strange to me, is that only the .hdr is encoded in latin1. What about the others: they have been created from a windows platform too (?).

cbrnr · 2020-02-13T12:54:06Z

The other text files are ASCII-encoded, which is a subset of Latin-1. This just means that they don't contain any special characters because the first 128 character encodings are identical in ASCII and Latin-1.

benoitvalery · 2020-02-13T13:05:54Z

So we have to ensure what format encoding is used by NirStar 15.2. Other users inputs would be great. If format encoding is varying from one version to another (15.0 vs 15.2, or windows 7 vs windows 10 ?), then, only a chardet solution would be reasonable. Am I wrong ?

cbrnr · 2020-02-13T13:08:17Z

The safest solution would be to use chardet to infer the encoding. However, we have been simply assuming latin-1 in other functions, so we could as well do it here. It will fix your problem, and will likely work for most other cases. If someone encounters more problems, we can always choose the safer solution then.

larsoner · 2020-02-13T14:30:24Z

Agreed let's just go with latin-1 and if it turns out to be problematic we'll do something smarter

rob-luke · 2020-02-13T20:48:56Z

Thanks for fixing this so quickly @larsoner

I checked files from multiple nirx machines that I have access to and they all have the same encoding as the test set I uploaded. I also can’t see any settings in nirstar that would change the encoding.

I don’t know much about windows, but could the locale change the encoding? I am based in Australia, where are you @benoitvalery . It seems this is sorted now, but I’m just curious as to what might have caused this issue.

cbrnr · 2020-02-14T06:34:08Z

The locale almost certainly affects the encoding. If you don't specify an encoding, the system default will be used. This is CP-1252 (~ Latin-1) for many western languages, but something else on a lot of other systems. Therefore, if a Windows user in e.g. Russia records nirx files, these will likely be encoded as CP-1251 (assuming that's the Windows default). We might get away if only characters with identical encodings are used, or decode the wrong characters. If this really happens, we can add an encoding parameter to read_raw_nirx (defaulting to latin-1).

benoitvalery · 2020-02-14T09:41:49Z

Hi, it seems that the parameter solution proposed by @cbrnr is the most generic. @rob-luke I'm based in France.

larsoner · 2020-02-14T12:19:09Z

Sounds reasonable but let's wait until we have suitable test files to work on this

benoitvalery · 2020-02-14T12:48:37Z

I tested another older (recorded 8 monthes ago) dataset this morning and all the files, including the .hdr file, were reported as ascii by my linux, without any pre-manipulation on it.

./Motor-2019-07-19_001.dat  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001_config.txt  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001_probeInfo.mat  --  application/octet-stream; charset=binary
./Motor-2019-07-19_001.avg  --  application/octet-stream; charset=binary
./Motor-2019-07-19_001.tpl  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.evt  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.set  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.hdr  --  application/x-wine-extension-ini; charset=us-ascii
./Motor-2019-07-19_001.wl1  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.nirs  --  application/octet-stream; charset=binary
./Motor-2019-07-19_001.inf  --  text/plain; charset=us-ascii
./Motor-2019-07-19_001.wl2  --  text/plain; charset=us-ascii

How is this possible ?

cbrnr · 2020-02-14T12:54:39Z

This is possible because maybe you didn't include any special non-ASCII characters in the header, then Latin-1 is identical to ASCII.

rob-luke · 2020-08-25T01:01:26Z

Sorry to reopen this but @benoitvalery I am trying to fix date reading for French files over in #7891

But I can't download the small file you linked above #7313 (comment). Are you able to reupload this small file somewhere so I can grab it and ensure I don't break the support we added here. Thanks

benoitvalery · 2020-09-21T13:45:09Z

Hi @rob-luke, sorry for the delay, here are the files you asked for !

rob-luke · 2020-09-22T03:22:59Z

Fantastic! Thanks so much @benoitvalery

FYI: I don't think we have fully fixed the French date coding, so there is an issue here if you have any more feedback #8219

larsoner mentioned this issue Feb 13, 2020

MRG, FIX: Fix encoding bug #7314

Merged

drammock closed this as completed in #7314 Feb 13, 2020

rob-luke mentioned this issue Sep 7, 2020

MRG: Extract measurement date and age for NIRX files #7891

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't load raw nirx data, due to encoding issue #7313

Can't load raw nirx data, due to encoding issue #7313

benoitvalery commented Feb 13, 2020 •

edited

Loading

cbrnr commented Feb 13, 2020

benoitvalery commented Feb 13, 2020

cbrnr commented Feb 13, 2020

benoitvalery commented Feb 13, 2020

cbrnr commented Feb 13, 2020

benoitvalery commented Feb 13, 2020 •

edited

Loading

cbrnr commented Feb 13, 2020

larsoner commented Feb 13, 2020

rob-luke commented Feb 13, 2020

cbrnr commented Feb 14, 2020

benoitvalery commented Feb 14, 2020

larsoner commented Feb 14, 2020

benoitvalery commented Feb 14, 2020

cbrnr commented Feb 14, 2020

rob-luke commented Aug 25, 2020

benoitvalery commented Sep 21, 2020 •

edited

Loading

rob-luke commented Sep 22, 2020

Can't load raw nirx data, due to encoding issue #7313

Can't load raw nirx data, due to encoding issue #7313

Comments

benoitvalery commented Feb 13, 2020 • edited Loading

MWE

cbrnr commented Feb 13, 2020

benoitvalery commented Feb 13, 2020

cbrnr commented Feb 13, 2020

benoitvalery commented Feb 13, 2020

cbrnr commented Feb 13, 2020

benoitvalery commented Feb 13, 2020 • edited Loading

cbrnr commented Feb 13, 2020

larsoner commented Feb 13, 2020

rob-luke commented Feb 13, 2020

cbrnr commented Feb 14, 2020

benoitvalery commented Feb 14, 2020

larsoner commented Feb 14, 2020

benoitvalery commented Feb 14, 2020

cbrnr commented Feb 14, 2020

rob-luke commented Aug 25, 2020

benoitvalery commented Sep 21, 2020 • edited Loading

rob-luke commented Sep 22, 2020

benoitvalery commented Feb 13, 2020 •

edited

Loading

benoitvalery commented Feb 13, 2020 •

edited

Loading

benoitvalery commented Sep 21, 2020 •

edited

Loading