Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the usage of seqan3::fm_index_cursor using seqan3::bitpacked_sequence. #3264

Closed
Dhruv-mak opened this issue Jul 15, 2024 · 4 comments · Fixed by #3268
Closed
Labels
question a user question how to do certain things

Comments

@Dhruv-mak
Copy link

Platform

  • SeqAn version: 3.3.0
  • Operating system: Linux arch 6.9.8-arch1-1 SMP PREEMPT_DYNAMIC Fri, 05 Jul 2024 22:11:24 +0000 x86_64 GNU/Linux
  • Compiler: g++ (GCC) 14.1.1 20240522

Question

I am relatively new to C++ and I'm currently learning to use the SeqAn3 library. I've been following the documentation for the fm_index, but I encountered an issue that I cannot resolve.

Here's a code Snippet:

#include <seqan3/argument_parser/all.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/alphabet/nucleotide/dna5.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/io/sequence_file/input.hpp>
#include <seqan3/search/fm_index/fm_index.hpp>
#include <string>
#include <vector>
#include <seqan3/alphabet/container/bitpacked_sequence.hpp>
#include <fstream>

int main(int argc, char const** argv) {
    using namespace seqan3::literals;

    seqan3::bitpacked_sequence<seqan3::dna4> seq{"ACGGTCAGGTTC"_dna4};
    seqan3::fm_index index{seq};

    // I don't know why the normal slicing is not working???
    // auto search(seq.begin(), seq.begin() + 4);
    auto search = seqan3::views::slice(seq, 0, 4);

    auto cursor = index.cursor();
    cursor.extend_right(search);
}

Steps to Reproduce

  • Run the given code with seqan3 version 3.3.0

Expected Behavior

  • The FM-index cursor should be extended to the right using the sliced sequence.

Observed Behavior

  • The code does not compile or execute as expected.
@Dhruv-mak Dhruv-mak added the question a user question how to do certain things label Jul 15, 2024
@rrahn
Copy link
Contributor

rrahn commented Jul 16, 2024

Hi @Dhruv-mak,

thanks for reporting this. We will look into it.

@SGSSGene
Copy link
Contributor

The error message this code produces is:

In file included from ../seqan3/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp:16,
                 from ../seqan3/include/seqan3/alphabet/nucleotide/dna5.hpp:17,
                 from main.cpp:3:
../seqan3/include/seqan3/alphabet/detail/convert.hpp: In instantiation of ‘constexpr const std::array<seqan3::dna4, 4> seqan3::detail::convert_through_char_representation<seqan3::bitpacked_sequence<seqan3::dna4>::reference_proxy_type, seqan3::dna4>’:
../seqan3/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp:86:21:   required from ‘constexpr seqan3::nucleotide_base<derived_type, size>::nucleotide_base(const other_nucl_type&) [with other_nucl_type = seqan3::bitpacked_sequence<seqan3::dna4>::reference_proxy_type; derived_type = seqan3::dna4; auto size = 4]’
../seqan3/include/seqan3/alphabet/nucleotide/dna4.hpp:78:19:   required from ‘bool seqan3::fm_index_cursor<index_t>::extend_right(seq_t&&) [with seq_t = std::ranges::subrange<seqan3::detail::random_access_iterator<seqan3::bitpacked_sequence<seqan3::dna4> >, seqan3::detail::random_access_iterator<seqan3::bitpacked_sequence<seqan3::dna4> >, std::ranges::subrange_kind::sized>&; index_t = seqan3::fm_index<seqan3::dna4, seqan3::single, sdsl::csa_wt<sdsl::wt_pc<sdsl::balanced_shape, sdsl::int_vector<1>, sdsl::rank_support_v<1, 1>, sdsl::select_support_scan<>, sdsl::select_support_scan<0>, sdsl::byte_tree<> >, 16, 10000000, sdsl::sa_order_sa_sampling<>, sdsl::isa_sampling<>, sdsl::plain_byte_alphabet> >]’
main.cpp:23:24:   required from here
../seqan3/include/seqan3/alphabet/detail/convert.hpp:43:50: error: use of deleted function ‘seqan3::bitpacked_sequence<alphabet_type>::reference_proxy_type::reference_proxy_type() [with alphabet_type = seqan3::dna4]’
   43 |             assign_char_to(to_char(assign_rank_to(i, in_t{})), ret[i]);
      |                                    ~~~~~~~~~~~~~~^~~~~~~~~~~
In file included from main.cpp:9:
../seqan3/include/seqan3/alphabet/container/bitpacked_sequence.hpp:105:9: note: declared here
  105 |         reference_proxy_type() = delete;
      |         ^~~~~~~~~~~~~~~~~~~~

@Dhruv-mak:
the seqan3::fm_index does not benefit from using seqan3::bitpacked_sequence<seqan3::dna4>. If you have no requirement to use it I suggest using std::vector<seqan3::dna4> instead.

@rrahn
Copy link
Contributor

rrahn commented Jul 16, 2024

I can confirm the following error:

In file included from ../seqan3-src/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp:13,
                 from ../seqan3-src/include/seqan3/alphabet/nucleotide/dna15.hpp:14,
                 from ../seqan3-src/test/unit/alphabet/container/bitpacked_sequence_test.cpp:12:
../seqan3-src/include/seqan3/alphabet/detail/convert.hpp: In instantiation of 'constexpr const std::array<seqan3::dna4, 4> seqan3::detail::convert_through_char_representation<seqan3::bitpacked_sequence<seqan3::dna4>::reference_proxy_type, seqan3::dna4>':
../seqan3-src/include/seqan3/alphabet/nucleotide/nucleotide_base.hpp:83:21:   required from 'constexpr seqan3::nucleotide_base<derived_type, size>::nucleotide_base(const other_nucl_type&) [with other_nucl_type = seqan3::bitpacked_sequence<seqan3::dna4>::reference_proxy_type; derived_type = seqan3::dna4; auto size = 4]'
../seqan3-src/include/seqan3/alphabet/nucleotide/dna4.hpp:75:19:   required from here
../seqan3-src/include/seqan3/alphabet/detail/convert.hpp:40:50: error: use of deleted function 'seqan3::bitpacked_sequence<alphabet_type>::reference_proxy_type::reference_proxy_type() [with alphabet_type = seqan3::dna4]'
   40 |             assign_char_to(to_char(assign_rank_to(i, in_t{})), ret[i]);
      |                                    ~~~~~~~~~~~~~~^~~~~~~~~~~
In file included from ../seqan3-src/test/unit/alphabet/container/bitpacked_sequence_test.cpp:10:
../seqan3-src/include/seqan3/alphabet/container/bitpacked_sequence.hpp:102:9: note: declared here
  102 |         reference_proxy_type() = delete;

produced by the minimal example:

  using namespace seqan3::literals;

  seqan3::bitpacked_sequence<seqan3::dna4> source{"ACGGTCAGGTTC"_dna4};
  auto it = source.begin();
  seqan3::dna4 val = static_cast<seqan3::dna4>(*it);

The explicit cast of the proxy type returned by bitpacked sequence to a normal alphabet will call internally the default constructor of this proxy type causing the compile error, since the proxy type is not default constructible. So this needs some further digging into the alphabet mechanics which may take a while. For the time being, I suggest not using the bitpacked_sequence. Note internally the FM index uses the SDSL and this stores the given sequence in its own memory representation. So there is literally no benefit for using the bitpacked sequence if only the FM index is needed.

@rrahn
Copy link
Contributor

rrahn commented Jul 16, 2024

Interestingly, this compiles:

using namespace seqan3::literals;

seqan3::bitpacked_sequence<seqan3::dna4> source{"ACGGTCAGGTTC"_dna4};
auto it = source.begin();
seqan3::dna4 val = *it;

So implicit conversion seems to work but via a different code path. Maybe some general issue with the alphabet_proxy. I'll investigate.

rrahn added a commit to rrahn/seqan3 that referenced this issue Jul 16, 2024
… `seqan3::bitpacked_sequence`. seqan#3264

Further constraints the conversion constructors of the nucleotide and aminoacid base alphabet types to only consider conversion if all requirements are fulfilled. This is now represented by the specific concept `convertable_to_through_char_representation`.
Through this proxy types will not be converted through their char representation but will be implicitly converted to their underling base alphabet type, e.g. dna4, and then copied/moved using the copy/move constructor of the underlying base alphabet type.
@rrahn rrahn mentioned this issue Jul 16, 2024
eseiler pushed a commit that referenced this issue Jul 16, 2024
* Fixes #3267 by adding proper constraints to the char conversion table.

* Fixes Question regarding the usage of `seqan3::fm_index_cursor` using `seqan3::bitpacked_sequence`. #3264

Further constraints the conversion constructors of the nucleotide and aminoacid base alphabet types to only consider conversion if all requirements are fulfilled. This is now represented by the specific concept `convertable_to_through_char_representation`.
Through this proxy types will not be converted through their char representation but will be implicitly converted to their underling base alphabet type, e.g. dna4, and then copied/moved using the copy/move constructor of the underlying base alphabet type.

* [MISC] automatic linting

* Adds missing copyright text in test file

* Removes superfluous vector include in test file.

---------

Co-authored-by: seqan-actions[bot] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question a user question how to do certain things
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants