Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added initial support for Markdown - issue-8 #9

Merged
merged 7 commits into from
Sep 25, 2021

Conversation

RC-Lee
Copy link
Contributor

@RC-Lee RC-Lee commented Sep 23, 2021

Fixes #8

The original tool had no support for Markdown files. I've added initial support for Markdown files with the use of regex.

import re is needed to use regex.

The tool will now also read files that ends with ".md"

In TextFile.process_file(), the code will process between ".txt" and ".md" files.
The process for ".txt" files have not been changed.
For ".md" files, it will process markdown syntax features for

  • Italics - will be generated surrounded by <i>...</i> tag
  • Bold - will be generated surrounded by <b>...</b> tag
  • Heading 1 - Context in Heading 1 for any ".md" file is treated as the title. Will be generated surrounded by <h1>...</h1> tag
  • Heading 2 - Will be generated surrounded by <h2>... </h2> tag

I've created regex patterns for each individual syntax for easier implementation and editing.

To handle bold and italics, I've used the findall() function in python to find all instances where the contents matches the regex in a single paragraph. Then I've replaced all matching instances from their Markdown syntax to having the corresponding html tag.

I've noticed that html italic and bold tags will cause the generated files to create new indented code blocks in html.
For example

*I* am afraid, Watson, that I shall have to go,” said Holmes, as we
sat *down* together to our breakfast **one** morning.

in .md files will generate in html as

  <p>
   <i>
    I
   </i>
   am afraid, Watson, that I shall have to go,” said Holmes, as we
sat
   <i>
    down
   </i>
   together to our breakfast
   <b>
    one
   </b>
   morning.
  </p>

The webpage will generate looking fine. But I'm still trying to figure out if this might be the cause of BeautifulSoup parsing, or something else.

Please review and provide feedback at your latest convenience. Thanks!

@abatomunkuev
Copy link
Owner

Hello! I have reviewed your code.

I have found one problem with italic text. I think the regular expression for italic text doesn't work. I have created a test Markdown .md file with the following content:

# Testing Heading & Title 

## Testing Heading 2 
## Testing Heading 3

**testing bold text** 
*testing italic text*

***testing both bold and italic text***

The result that I got from the tool:
Screen Shot 2021-09-24 at 7 37 30 PM

In addition, I would be great if you can add some information about Markdown feature in README.md file.

@RC-Lee
Copy link
Contributor Author

RC-Lee commented Sep 25, 2021

I've fixed the issue with italics
I've also added Markdown support for h3 headers and links.
I've also added Markdown information in the README.md file

@abatomunkuev abatomunkuev merged commit c3bf225 into abatomunkuev:master Sep 25, 2021
@abatomunkuev
Copy link
Owner

Thank you! You did a great job!

Your work is working. Looking forward to work with you again the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Markdown Support Feature.
2 participants