Skip to content

Reddit Tags

helena zhou edited this page Apr 19, 2024 · 6 revisions

1. Feature Name

Reddit Tags

2. Literature Source (Serial Number, link)

N/A

3. Description of how the feature is computed (In Layman’s terms)

This feature is for text chats online. It computes counts of Reddit-specific HTML tags and other online-specific text features:

  • All-caps
  • Links
  • User references, indicated by format u/username
  • Bold
  • Bullet points
  • Numbering
  • Line breaks
  • Quotes
  • Block quote responses, indicating if the message is quoting someone else by ">" or ">"
  • Ellipses
  • Parentheses
  • Emojis

These elements carry unique meaning in online contexts, as they influence the meaning and tone of the text. Therefore, we capture these elements using regex patterns and straightforward text processing techniques for scalable and efficient performance.

4. Algorithms used (KNN, Logistic Regression etc.)

N/A

5. ML Inputs/Features

N/A

6. Statistical concepts used

N/A

7. Pages of the literature to be referred to for details

N/A

8. Any tweaks/changes/adaptions made from the original source

N/A