- Added new sparse matrix backend
ScipySparseMat
, which requires that scipy is installed if this backend is in use. This backend will generally yield improvements in both runtime and memory consumption, especially with larger sparse tables. The default sparse matrix backend is stillCSMat
(this means that scipy is an optional dependency of the biom-format project).
- Sparse backends
SparseDict
andSparseMat
have been removed in favor ofCSMat
. Cython is no longer a dependency. - The BIOM Format project license is now Modified BSD (see COPYING.txt for more details) and is no longer GPL. To change the license, we obtained written permission (by email) from all past and present developers on the biom-format project. The core developers, including @gregcaporaso, @wasade, @jrrideout, and @rob-knight were included on these emails. For code that was derived from the QIIME and PyCogent projects, which are under the GPL license, written permission was obtained (by email) from the developers of the original code (tracing through the commit history, as necessary). @gregcaporaso, @wasade, @jrrideout, and @rob-knight were included on these emails.
- Removed the top-level
python-code
directory, moving all contents up one level. If you are installing the biom-format project by manually settingPYTHONPATH
to<dir prefix>/biom-format/python-code
, you will need to change the path to<dir prefix>/biom-format
instead. Please see the installation instructions for more details. - Reorganized sparse backend code into a new subpackage,
biom.backends
. This change should not affect client code.
Table.collapseObservationsByMetadata
andTable.collapseSamplesByMetadata
now have an additional argument,include_collapsed_metadata
, which allows the user to either include or exclude collapsed metadata in the collapsed table.Table.collapseObservationsByMetadata
andTable.collapseSamplesByMetadata
now have an additional argument,one_to_many_mode
, which allows the user to specify a collapsing strategy for one-to-many metadata relationships (currently supports adding and dividing counts).Table.binObservationsByMetadata
,Table.binSamplesByMetadata
,Table.collapseObservationsByMetadata
, andTable.collapseSamplesByMetadata
now have an additional argument,constructor
, which allows the user to choose the return type of the binned/collapsed table(s).Table.delimitedSelf
now has an additional argument,observation_column_name
, which allows the user to specify the name of the first column in the output table (e.g. 'OTU ID', 'Taxon', etc.).- Added new
Table.transpose
method.
- pyqi 0.2.0 is now a required dependency. This changes the look-and-feel of the biom-format command-line interfaces and introduces a new executable,
biom
, which can be used to see a list of all available biom-format command-line commands. Thebiom
command is now used to run biom-format commands, instead of having a Python script (i.e., .py file) for each biom-format command. The old scripts (e.g., add_metadata.py, convert_biom.py, etc.) are still included but are deprecated. Users are pointed to the newbiom
command to run instead. Bash tab completion is now supported for all command and option names (see the biom-format documentation for instructions on how to enable this). - The following scripts have had their names and options changed:
add_metadata.py
is nowbiom add-metadata
. Changed option names:--input_fp
is now--input-fp
--output_fp
is now--output-fp
--sample_mapping_fp
is now--sample-metadata-fp
--observation_mapping_fp
is now--observation-metadata-fp
--sc_separated
is now--sc-separated
--int_fields
is now--int-fields
--float_fields
is now--float-fields
--sample_header
is now--sample-header
--observation_header
is now--observation-header
- New option
--sc-pipe-separated
biom_validator.py
is nowbiom validate-table
. Changed option names:-v
/--verbose
is now--detailed-report
--biom_fp
is now--input-fp
convert_biom.py
is nowbiom convert
. Changed option names:--input_fp
is now--input-fp
--output_fp
is now--output-fp
--biom_type
is now--matrix-type
--biom_to_classic_table
is now--biom-to-classic-table
--sparse_biom_to_dense_biom
is now--sparse-biom-to-dense-biom
--dense_biom_to_sparse_biom
is now--dense-biom-to-sparse-biom
--sample_mapping_fp
is now--sample-metadata-fp
--observation_mapping_fp
is now--observation-metadata-fp
--header_key
is now--header-key
--output_metadata_id
is now--output-metadata-id
--process_obs_metadata
is now--process-obs-metadata
--biom_table_type
is now--table-type
print_biom_python_config.py
is nowbiom show-install-info
.print_biom_table_summary.py
is nowbiom summarize-table
. Changed option names:--input_fp
is now--input-fp
--output_fp
is now--output-fp
. This is now a required option (output is no longer printed to stdout).--num_observations
is now--qualitative
--suppress_md5
is now--suppress-md5
subset_biom.py
is nowbiom subset-table
. Changed option names:--biom_fp
is now--input-fp
--output_fp
is now--output-fp
--ids_fp
is now--ids
biom.parse.parse_mapping
has been replaced bybiom.parse.MetadataMap
.biom.parse.MetadataMap.fromFile
can be directly substituted in place ofbiom.parse.parse_mapping
.
- Fixed performance issue with formatting BIOM tables for writing to a file.
- Fixed issue with
Table.addSampleMetadata
andTable.addObservationMetadata
when adding metadata to a subset of the samples/observations in a table that previously was without any sample/observation metadata. - Fixed issue with
Table.addSampleMetadata
andTable.addObservationMetadata
when updating a table's existing metadata, including the case where there are sample/observation IDs that are in the metadata file but not in the table.
-
Table.collapseObservationsByMetadata
andTable.collapseSamplesByMetadata
now support one-to-many relationships on the metadata field to collapse on. -
added new script called
print_biom_table_summary.py
(and accompanying tutorial) that prints summary statistics of the input BIOM table as a whole and on a per-sample basis
SparseMat
now uses cython for loops more efficiently
- fixed serious performance issue with
Table.transformSamples/Observations
when usingCSMat
as the sparse backend
- added documentation for how to switch sparse backends via BIOM config file
- performance issue on table creation with
CSMat
where anO(N)
lookup was being performed
- new default sparse matrix backend
CSMat
(COO/CSR/CSC) more efficient thanSparseDict
andSparseMat
(pure python + numpy) - support for biom config file, which allows specification of sparse backend to use. Currently supports
CSMat
(default),SparseMat
, andSparseDict
. Default can be found undersupport_files/biom_config
, and can be copied to$HOME/.biom_config
or located by setting$BIOM_CONFIG_FP
- new script called
add_metadata.py
with accompanying tutorial that allows users to add arbitrary sample and/or observation metadata to biom files - new script called
subset_biom.py
that efficiently pulls out a subset of a biom table (either by samples or observations). Useful for very large tables where memory may be an issue
- parser is more efficient for sparse tables and formatter is more efficient for both table types (less memory consumption)
biom.Table
objects are now immutable (except that metadata can still be added viaaddSampleMetadata
/addObservationMetadata
).__setitem__
andsetValueByIds
have been removed andSampleIds
,ObservationIds
,SampleMetadata
, andObservationMetadata
members are now tuples as a resultbiom.Table
object has a new method calledgetTableDensity()
- performance testing framework has been added for
Table
objects
convert_biom.py
now converts dense tables to sparse tables (previously it didn't do anything)- many misc. fixes to script help/documentation and docstrings (fixing typos, editing for clarity, etc.)
- new default sparse matrix backend
SparseMat
(requires Cython) more efficient over existingSparseDict
backend
- format now accepts unicode but does not accept str due to JSON parsing from Python
- specification for metadata is now either
null
or an object - PySparse has been gutted, sparse matrix support is now through
Table.SparseDict
- more table types!
Table.getBioFormatJsonString()
and similar methods now require ageneratedby
string