Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lists not explicitly styled as lists get turned into paragraphs #142

Open
pcraig3 opened this issue Jan 2, 2025 · 0 comments
Open

Lists not explicitly styled as lists get turned into paragraphs #142

pcraig3 opened this issue Jan 2, 2025 · 0 comments
Assignees

Comments

@pcraig3
Copy link
Collaborator

pcraig3 commented Jan 2, 2025

Due to how Word works and how mammoth works, sometimes we define a particular class of element as a paragraph when it might also be a list.

Here's a good example:

Word document styling mammoth document HTML conversion
lots of lists almost everything is a paragraph now
Image Image

What is happening mechanically is that I am defining style maps that map particular classes to HTML elements. However, some of these classes are very general (paragraph, normaltextrun, Default) and can apply equally to a paragraph or a list, depending on how the list is created.

I am not sure if there is a good way to

  • preserve the intent of the original Word doc, while
  • being explicit about how to treat content that comes in

Something I've done in the past has been to ignore styles that are too general (like Default), but it feels risky to me because then we don't really know what it will come in as. However, if we can't do anything else about this, then that's probably the best thing to do on balance.

I've opened up an issue in the mammoth repo to see if there is a better answer to this: mwilliamson/python-mammoth#151

Will update this issue once that one gets an answer.

@pcraig3 pcraig3 self-assigned this Jan 3, 2025
pcraig3 added a commit that referenced this issue Jan 13, 2025
The idea is that we can actually identify list item elements
without a named style map, and instead convert them directly to lists.

The TL;DR here is that more lists will convert automatically.

I am using an undocumented but stable stylemap syntax for this.

More can be learned here:

- #142
- mwilliamson/python-mammoth#151
pcraig3 added a commit that referenced this issue Jan 13, 2025
The idea is that we can actually identify list item elements
without a named style map, and instead convert them directly to lists.

The TL;DR here is that more lists will convert automatically.

I am using an undocumented but stable stylemap syntax for this.

More can be learned here:

- #142
- mwilliamson/python-mammoth#151
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant