-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
div mis-converted #107
Comments
Looking at the code, this is probably very hard to do because for this the second div has to lookbehind and see that the previous |
Workaround (this relies on the fix for #92 to be applied):
Maybe it helps someone. For the sake of completeness, the following is my current complete example of how I clean up HTML from RSS feeds to post it, as Markdown, to Fediverse (called as
|
We're hitting this too. It is a difficult fix in the current architecture. |
Thanks for reporting this! Indeed it is quite hard to fix. Especially we would have to decide if divs always behave as paragraphs. If we do, we could just handle divs the save as p and it would be fixed. But I think this would break different stuff. I'm open to suggestions, tho. |
On Sun, 24 Nov 2024, AlexVonB wrote:
Especially we would have to decide if divs always behave as paragraphs.
Uhm… no need to decide it, there’s already a spec for that ;-)
Basically, a div forces the browser to begin at a new line,
thinking in terms like what text browsers such as lynx would use.
If the current position is already at a new line (e.g. because there
was a </p> before it, no need to do anything; otherwise, forcing a
linebreak is needed.
So they specifically _don’t_ behave like paragraphs, which also have
inter-paragraph spacing.
Hence, the example and expected text in the submission above.
bye,
//mirabilos
--
"Using Lynx is like wearing a really good pair of shades: cuts out
the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL."
-- Henry Nelson, March 1999
|
We previously used a source-modifying workaround like @mirabilos posted. However, we recently switched to a subclass-based workaround: class CustomMarkdownConverter(markdownify.MarkdownConverter):
"""
Create a custom MarkdownConverter that fixes some issues.
"""
def convert_div(self, el, text, convert_as_inline):
if convert_as_inline:
return " " + text.strip() + " "
else:
return "\n\n" + text + "\n\n" Our application prefers treating |
Expected:
'foo \nbarbazmeow'
The text was updated successfully, but these errors were encountered: