Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing XLS consumption/production (originally: "Any excel2003 (.xls) plan?") #227

Closed
pebble2015 opened this issue Sep 24, 2017 · 6 comments

Comments

@pebble2015
Copy link
Contributor

Your project help me a lot for my work,I really appreciate it.
I know people recently don't usr excel 2003 format much,
but sometimes I have to deal with this files.
I just want to know is there any plan for .xls file support.
If .xls file is in plan,I'd like to contribute to the codebase.
Thanks in advance.

@tfussell
Copy link
Owner

Glad to hear it. I've considered adding XLS support. It helps that XLSX is basically an XML encoded version of the internal XLS file that Excel uses in memory so most of the structure is in place. Additionally, I've already implemented the CFBF (aka compound document) parser and serializer needed for XLS files and the system was designed to allow for additional parsers (xlnt::xlsx_consumer and xlnt::xlsx_producer). I don't think it would be a huge job, but it would be some tedious coding. I can help with architecture and prototyping if you want to give it a try.

@pebble2015
Copy link
Contributor Author

Thanks for your reply, But, how can I start?

@tfussell
Copy link
Owner

For new features, I like to use test-driven development. The first step would be to create a basic XLS file and add it to the tests/data directory. Then create a new test in tests/workbook/serialization_test_suite.hpp which loads that XLS file. The test should fail because the format is not XLSX.

The next step is to detect whether the file is XLS or XLSX. You could check the file extension, but I think it's more reliable to check the file magic (the first bytes of the file). XLSX is a ZIP file so the first four bytes are always 0x504b0304. XLS is a compound document so it starts with 0xd0cF11e0. This check would go in void workbook::load(std::istream &stream). Be careful with endianness because CFBF files are big-endian and most modern computers are little-endian (or vice versa, I forget). I recommend using this tool to explore an XLS file to get an idea of what you're working with.

After that, I would copy xlnt::xlsx_consumer and xlnt::xlsx_producer headers and source files in source/detail/serialization into xlnt::xls_consumer and xlnt::xls_producer. Back in void workbook::load(std::istream &stream), use the xlsx_consumer::read method for XLSX file and the xls_consumer::read method for XLS files.

Instead of reading internal data files from a ZIP archive as xlnt::xlsx_consumer does, xlnt::xls_consumer would read those files using xlnt::detail::compound_document. Each file is binary instead of XML, so the final step would be reading each file steam byte by byte according to the specification. I've already written some helpers in source/detail/binary.hpp for dealing with binary structures.

You may with to abstract some of the common consumption and production logic into a parent class of xls_consumer and xlsx_consumer, something like common_consumer, but I'm not sure how much there will be in common between the two. Good luck!

@tfussell tfussell changed the title Any excel2003(.xls) plan? Implementing XLS consumption/production (originally: "Any excel2003 (.xls) plan?") Sep 25, 2017
@pebble2015
Copy link
Contributor Author

Thanks! I'll have a try.

@tfussell
Copy link
Owner

tfussell commented May 2, 2019

Closing due to inactivity.

@tfussell tfussell closed this as completed May 2, 2019
@ClearSeve
Copy link

Is this format still supported?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants