Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading and Parsing .fcs files from Biorad S3e Cell #241

Open
rwbaer opened this issue Feb 23, 2025 · 6 comments
Open

Reading and Parsing .fcs files from Biorad S3e Cell #241

rwbaer opened this issue Feb 23, 2025 · 6 comments

Comments

@rwbaer
Copy link

rwbaer commented Feb 23, 2025

Describe the bug
I have been unable to open .fcs files from a biorad S3e cell sorter. The error message does not make it clear what the mistake is.

I can read the files with prosort (Biorad machine software) and with FCS Express 6. I can also read these same files using R code with flowcore and associated packages, but this was not the case when I started. In case the error is related, it might be worth looking at that problem report link: https://support.bioconductor.org/p/72379/ .
My memory is that the R flowcore problem was related to an unsigned vs signed integer in the file in some spot that we identified with a hex editor.

As noted at the bottom, I have attached a zipped .fcs file that fails to open.

Here is the tutorial code I tried in a jupyter notebook
Code To Reproduce
Code to reproduce the behavior:

import flowkit as fk
# Load fcs files
fcs_file_path = '../Samples/M1_WM278_S1.fcs'
fk.load_samples("../Samples", filename_as_id=False)
sample = fk.Sample(fcs_file_path)
# print(sample.id)
# df = sample.as_dataframe()
# print(df.head())

Here is the result on the vscode console:

c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: divide by zero encountered in divide
  self._raw_events = raw_events / channel_gain
c:\Users\rbaer\anaconda3\envs\vscode\Lib\site-packages\flowkit\_models\sample.py:330: RuntimeWarning: invalid value encountered in divide
  self._raw_events = raw_events / channel_gain

Expected behavior
I expected the files to read in and form what is called a "flowset" in the R world or perhaps some other object of eqivalent python form.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Windows 11 Pro: OS Build 27788.1000
  • Python version Python 3.12.7
  • FlowKit version [flowkit 1.2.3, flowio 1.3.0, flowutils 1.1.0]

Additional context
I am including an examplezipped Biorad S3e Cell sorter .fcs file so you can try it yourself and examine the file structure. Perhaps it is just my novice python skills.

M0_WM278_S1.zip

@rwbaer
Copy link
Author

rwbaer commented Mar 6, 2025

I'm still new to .fcs file formats, but the metadat does seem to be read. i see reference to byte offset in the documentation, but can't quite understand the implications. The PnG tags are all zero which I read somewhere is the amplifier gain, but I don't know if this means something special based on other tags. The fact that these same files read into other cytometry programs suggest that flowkit is handling something differently. I can't figure what yet.

sample.get_metadata() prduces the following if it is helpful:

{'beginanalysis': '0000000000000000', 
$'endanalysis': '0000000000000000', 
$'begindata': '0000000000004161', 
$'enddata': '0000000013376120', 
$'beginstext': '0000000013376121', 
$'endstext': '0000000013499571', 
$'nextdata': '0000000000000000', 
$'datatype': 'I', 
$'mode': 'L', 
$'par': '21', 
$'byteord': '1,2,3,4', 
$'tot': '159190', 
$'sys': 'Microsoft Windows NT 6.2.9200.0', 
$'cyt': 'S3', 
$'cytsn': '776BR2036', 
$'inst': 'Bio-Rad Laboratories', 
$'p1n': 'TIME_MSW', 
$'p1s': 'TIME-', 
$'p1b': '32', 
$'p1r': '4294967295', 
$'p1e': '0,0', 
$'p1g': '0', 
$'p1calibration': '45000.00,', 
$'p2n': 'TIME_LSW', 
$'p2s': 'TIME-', 
$'p2b': '32', 
$'p2r': '4294967295', 
$'p2e': '0,0', 
$'p2g': '0', 
$'p2calibration': '45000.00,', 
$'p3n': 'FSC-HEIGHT', 
$'p3s': 'FSC-HEIGHT', 
$'p3b': '32', 
$'p3r': '1048576', 
$'p3e': '0,0', 
$'p3g': '0', 
$'p4n': 'FSC-AREA', 
$'p4s': 'FSC-AREA', 
$'p4b': '32', 
$'p4r': '1048576', 
$'p4e': '0,0', 
$'p4g': '0', 
$'p5n': 'FSC-WIDTH', 
$'p5s': 'FSC-WIDTH', 
$'p5b': '32', 
$'p5r': '1048576', 
$'p5e': '0,0', 
$'p5g': '0', 
$'p6n': 'SSC-HEIGHT', 
$'p6s': 'SSC-HEIGHT', 
$'p6b': '32', 
$'p6r': '1048576', 
$'p6e': '0,0', 
$'p6g': '0', 
$'p7n': 'SSC-AREA', 
$'p7s': 'SSC-AREA', 
$'p7b': '32', 
$'p7r': '1048576', 
$'p7e': '0,0', 
$'p7g': '0', 
$'p8n': 'SSC-WIDTH', 
$'p8s': 'SSC-WIDTH', 
$'p8b': '32', 
$'p8r': '1048576', 
$'p8e': '0,0', 
$'p8g': '0', 
$'p9n': 'FL1-HEIGHT', 
$'p9s': 'Violet-HEIGHT', 
$'p9b': '32', 
$'p9r': '1048576', 
$'p9e': '0,0', 
$'p9g': '0', 
$'p10n': 'FL1-AREA', 
$'p10s': 'Violet-AREA', 
$'p10b': '32', 
$'p10r': '1048576', 
$'p10e': '0,0', 
$'p10g': '0', 
$'p11n': 'FL1-WIDTH', 
$'p11s': 'Violet-WIDTH', 
$'p11b': '32', 
$'p11r': '1048576', 
$'p11e': '0,0', 
$'p11g': '0', 
$'p12n': 'FL2-HEIGHT', 
$'p12s': 'FL2-HEIGHT', 
$'p12b': '32', 
$'p12r': '1048576', 
$'p12e': '0,0', 
$'p12g': '0', 
$'p13n': 'FL2-AREA', 
$'p13s': 'FL2-AREA', 
$'p13b': '32', 
$'p13r': '1048576', 
$'p13e': '0,0', 
$'p13g': '0', 
$'p14n': 'FL2-WIDTH', 
$'p14s': 'FL2-WIDTH', 
$'p14b': '32', 
$'p14r': '1048576', 
$'p14e': '0,0', 
$'p14g': '0', 
$'p15n': 'FL3-HEIGHT', 
$'p15s': 'Orange-HEIGHT', 
$'p15b': '32', 
$'p15r': '1048576', 
$'p15e': '0,0', 
$'p15g': '0', 
$'p16n': 'FL3-AREA', 
$'p16s': 'Orange-AREA', 
$'p16b': '32', 
$'p16r': '1048576', 
$'p16e': '0,0', 
$'p16g': '0', 
$'p17n': 'FL3-WIDTH', 
$'p17s': 'Orange-WIDTH', 
$'p17b': '32', 
$'p17r': '1048576', 
$'p17e': '0,0', 
$'p17g': '0', 
$'p18n': 'FL4-HEIGHT', 
$'p18s': 'SytoxAAD-HEIGHT', 
$'p18b': '32', 
$'p18r': '1048576', 
$'p18e': '0,0', 
$'p18g': '0', 
$'p19n': 'FL4-AREA', 
$'p19s': 'SytoxAAD-AREA', 
$'p19b': '32', 
$'p19r': '1048576', 
$'p19e': '0,0', 
$'p19g': '0', 
$'p20n': 'FL4-WIDTH', 
$'p20s': 'SytoxAAD-WIDTH', 
$'p20b': '32', 
$'p20r': '1048576', 
$'p20e': '0,0', 
$'p20g': '0', 
$'p21n': 'SORT', 
$'p21s': 'SORT', 
$'p21b': '32', 
$'p21r': '4294967295', 
$'p21e': '0,0', 
$'p21g': '0', 
$'etim': '15:41:15.37', 
$'btim': '15:38:45.78', 
$'date': '23-Aug-2024', 
$'spillover': '4,FL1-AREA,FL2-AREA,FL3-AREA,FL4-AREA,1,0,0.0954534634610807,0,0,1,0,0,0.00167785234899329,0,1,0,0,0,0,1'}

@whitews
Copy link
Owner

whitews commented Mar 6, 2025

Hi Rob,

Thanks for submitting this issue. The problem is definitely related to those 0 gain values. Re-reading he FCS specification sections on the PnG keyword, only non-zero values should be scaled. So a zero value is equivalent to a gain value of 1.0 (i.e. no gain).

I'm currently working on an update which will move pre-processing event data to the underlying FlowIO library. I'll include a fix for this in the next release. Do you mind if I include your FCS file in the repo as part of the test data?

Thanks again,
Scott

@rwbaer
Copy link
Author

rwbaer commented Mar 6, 2025

You are welcome to use the file as part of the test data. Thanks for looking into the issue.

@rwbaer rwbaer changed the title .fcs file from Biorad S3e Cell sorter fails to open Parsing .fcs files from Biorad S3e Cell Mar 8, 2025
@rwbaer
Copy link
Author

rwbaer commented Mar 8, 2025

Biorad S3e Cell sorter .fcs
I changed the title because I found some additional parsing issues for Biorad S3e cell sorter files that are probably worth adding to the same place. If you wish, I can move these to a separate issue/issues.

  • There are two time channels in these files named TIME_MSW and TIME_LSW that are P1N and P2N. The corresponding P1S and P2S are both tagged "TIME-" rather than "TIME". Typically TIME_MSW has all zeros in it and is of little interest. TIME_LSW does have some sort of time code in it, but I do not think it is the same as BD FACS CALIBER. There seems to be no TIMESTEP field to help us deal with this. 'sample.time_index' currently does not find either of these channels and so returns 'None'.
  • P21n and p21s are 'SORT' channels. Flowkit currently includes this SORT channel with sample.fluoro_indicies which it probably should not.

@rwbaer
Copy link
Author

rwbaer commented Mar 8, 2025

work-around --
For others who may run into this issue until this glich is fixed, a workaround seems to be to use:
sample = fk.Sample(myfile, cache_original_events=True) # cache data before applying gain correction

This means you can then use "source=" argument to access and work with your raw data, as in:
df_events = sample.as_dataframe(source='orig') # specify data just as it was read in
df_events.head()

** ISSUE WITH WORK-AROUND **
This work-around will not currently work for 'multiiple file reads' because
fk.load_samples() does not currently support the cache_original_events=True arguement

@rwbaer rwbaer changed the title Parsing .fcs files from Biorad S3e Cell Reading and Parsing .fcs files from Biorad S3e Cell Mar 8, 2025
@whitews
Copy link
Owner

whitews commented Mar 10, 2025

Hi Rob,

The presence of multiple time channels does pose a problem, especially in this case where neither have the conventional label string of "Time" (regardless of case, e.g. "TIME", "time", etc.). The FCS specification doesn't explicitly state that channels such as those in this example are invalid, but does imply there should be a single time channel and it should be identified by the PnN label "TIME". From section 3.2.20, in the $PnN description:

The value "TIME" shall be used in order to indicate TIME measurement with steps indicated in
$TIMESTEP.

The Sample.time_index attribute isn't used much internally in FlowKit, with uses for scaling the time channel by the provided $TIMESTEP and in the utility function generate_transforms. Other than that, it's mainly a convenient attribute for identifying the time channel for referencing it in analysis (e.g. gating). So, I'm not sure how big of an issue this is. The user could manually set the time_index if it is needed downstream.

The "SORT" channel being included in fluoro_indices is a potentially a bigger issue since that attribute is relied upon in various places for applying compensation. A current workaround for this would be to include the channel label in the null_channels kwarg list when creating a Sample instance. We could add a filter to exclude such channels from the list of fluorescent channel, though I'm reluctant to add logic for manufacturer-specific conventions that are not supported by the FCS spec. Adding a blanket rule for this scenario may inadvertently create a problem in another case. Once that occurs, the temptation to add cytometer specific conditionals starts down a road I'd rather not go. Do you know if other cytometers use the "SORT" channel label, especially by other manufacturers?

Thanks,
Scott

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants