You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Filter HTTP headers, and HTML from content on webpages so that it is consistent with the app implementation, and the ARCH implementation
- Update PlainTextExtractor to use .all() since HTML is removed from content
- Add domain to all()
- Update csv exports on app so that they are rfc4180 compliant
- Apply GitHub workflows to main branch
- Consistent formating on DataFrameLoader.scala
- Update tests as needed
- Resolves#538
- Filter HTTP headers, and HTML from content on webpages so that it is consistent with the app implementation, and the ARCH implementation
- Change all content to raw_content.
- Update PlainTextExtractor to use .all() since HTML is removed from content
- Add domain to all()
- Update csv exports on app so that they are rfc4180 compliant
- Apply GitHub workflows to main branch
- Consistent formatting on DataFrameLoader.scala
- Update tests as needed
- Update Apache Spark version in README.
- Resolves#538
In ARCH we remove headers and html on
.webpages()
. We should be consistent with none.If folks need the content with headers and html, they can grab it from
.all()
.The text was updated successfully, but these errors were encountered: