Skip to content

Commit

Permalink
Merge pull request #155 from thematters/feat/strip-html
Browse files Browse the repository at this point in the history
feat(utils): revise stripHtml to support line break replacement
  • Loading branch information
robertu7 authored May 17, 2024
2 parents b56b739 + e2a7444 commit b566f6d
Show file tree
Hide file tree
Showing 5 changed files with 74 additions and 87 deletions.
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@matters/ipns-site-generator",
"version": "0.1.5",
"version": "0.1.6",
"description": "IPNS site generator for matters.town",
"author": "https://github.com/thematters <[email protected]>",
"homepage": "https://github.com/thematters/ipns-site-generator",
Expand Down
86 changes: 23 additions & 63 deletions src/__tests__/__snapshots__/utils.test.js.snap
Original file line number Diff line number Diff line change
@@ -1,42 +1,21 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`utils "makeSummary" can produce summary text from HTML 1`] = `
"
Reprehenderit proident sit consectetur id consequat officia.
"Reprehenderit proident sit consectetur id consequat officia.
Duis ea voluptate cupidatat ad
Elit consequat labore tempor Lore..."
Elit consequat labore tempor Lorem voluptate …"
`;

exports[`utils "stripHtml" can generate clean text from HTML 1`] = `
"
Reprehenderit proident sit consectetur id consequat officia.
Duis ea voluptate cupidatat ad
Elit consequat labore tempor Lorem voluptate occaecat nostrud laborum minim. Cillum veniam ea cupidatat nulla commodo sunt amet magna amet sit culpa nulla deserunt reprehenderit duis. Et nostrud sunt ad cupidatat laboris. Reprehenderit dolor dolore elit voluptate ex. Fugiat in in officia non eiusmod irure et. Velit ut aliquip ipsum exercitation exercitation nisi voluptate enim amet exercitation. Et consectetur ex nisi anim id consequat eiusmod veniam ipsum ullamco nulla deserunt nostrud.
Nulla in fugiat labore ad.
Cupidatat amet fugiat culpa id
Et ut dolore dolore ex. Deserunt adipisicing id dolor eiusmod minim ea. Pariatur veniam velit ad culpa nisi sit. Non nostrud irure nulla pariatur ipsum irure fugiat anim id Lorem duis. Ullamco incididunt ex ullamco elit. Amet voluptate minim laborum anim duis aliquip officia enim Lorem mollit aliquip laboris. Mollit pariatur sunt pariatur occaecat deserunt esse . Est eu ut elit id nisi duis id magna commodo ex et id sint laboris .
Enim aliqua est proident commodo dolor incididunt eiusmod. Anim anim eu pariatur aliqua qui. Sit non commodo enim ut aute officia eu. Adipisicing proident eu velit id proident.
Voluptate officia adipisicing voluptate amet dolore ad tempor aliquip reprehenderit Lorem. Ad dolor id minim occaecat ea non nulla. Ullamco exercitation consectetur duis tempor incididunt qui id. Sunt voluptate qui ex do Lorem consectetur laborum mollit culpa sunt anim occaecat esse. Velit deserunt eiusmod deserunt. Anim ullamco ad minim velit nulla aliquip culpa consequat laboris quis ad Lorem pariatur. Occaecat sunt irure reprehenderit.
query {
"Reprehenderit proident sit consectetur id consequat officia.
Duis ea voluptate cupidatat ad
Elit consequat labore tempor Lorem voluptate occaecat nostrud laborum minim. Cillum veniam ea cupidatat nulla commodo sunt amet magna amet sit culpa nulla deserunt reprehenderit duis. Et nostrud sunt ad cupidatat laboris. Reprehenderit dolor dolore elit voluptate ex. Fugiat in in officia non eiusmod irure et. Velit ut aliquip ipsum exercitation exercitation nisi voluptate enim amet exercitation. Et consectetur ex nisi anim id consequat eiusmod veniam ipsum ullamco nulla deserunt nostrud.
Nulla in fugiat labore ad.
Cupidatat amet fugiat culpa id
Et ut dolore dolore ex. Deserunt adipisicing id dolor eiusmod minim ea. Pariatur veniam velit ad culpa nisi sit. Non nostrud irure nulla pariatur ipsum irure fugiat anim id Lorem duis. Ullamco incididunt ex ullamco elit. Amet voluptate minim laborum anim duis aliquip officia enim Lorem mollit aliquip laboris. Mollit pariatur sunt pariatur occaecat deserunt esse. Est eu ut elit id nisi duis id magna commodo ex et id sint laboris.
Enim aliqua est proident commodo dolor incididunt eiusmod. Anim anim eu pariatur aliqua qui. Sit non commodo enim ut aute officia eu. Adipisicing proident eu velit id proident.
Voluptate officia adipisicing voluptate amet dolore ad tempor aliquip reprehenderit Lorem. Ad dolor id minim occaecat ea non nulla. Ullamco exercitation consectetur duis tempor incididunt qui id. Sunt voluptate qui ex do Lorem consectetur laborum mollit culpa sunt anim occaecat esse. Velit deserunt eiusmod deserunt. Anim ullamco ad minim velit nulla aliquip culpa consequat laboris quis ad Lorem pariatur. Occaecat sunt irure reprehenderit.
query {
article(
input: { mediaHash: \\"zdpuAxP6uSfum74VS3pYmzBR9xvPbrBcX3J8BPpB3xdRGjVsX\\" }
) {
Expand All @@ -45,35 +24,16 @@ exports[`utils "stripHtml" can generate clean text from HTML 1`] = `
summary
}
}
Officia amet minim proident labore
Proident fugiat amet
Duis eiusmod mollit ipsum exercitation voluptate sit ullamco.
Labore aute ea irure
Adipisicing nisi deserunt velit proident nostrud et ipsum amet mollit.
Esse nostrud deserunt Lorem pariatur incididunt.
Non minim esse qui mollit consequat.
Exercitation dolor fugiat esse officia cupidatat anim.
Esse eu anim irure voluptate non laborum laborum dolore dolore.
Laboris et excepteur est adipisicing magna qui do sit eiusmod.
Qui aute voluptate
Labore dolor laboris anim. Laborum ut eiusmod et et minim duis aliquip deserunt laboris.
"
Officia amet minim proident labore
Proident fugiat amet
Duis eiusmod mollit ipsum exercitation voluptate sit ullamco.
Labore aute ea irure
Adipisicing nisi deserunt velit proident nostrud et ipsum amet mollit.
Esse nostrud deserunt Lorem pariatur incididunt.
Non minim esse qui mollit consequat.
Exercitation dolor fugiat esse officia cupidatat anim.
Esse eu anim irure voluptate non laborum laborum dolore dolore.
Laboris et excepteur est adipisicing magna qui do sit eiusmod.
Qui aute voluptate
Labore dolor laboris anim. Laborum ut eiusmod et et minim duis aliquip deserunt laboris."
`;
34 changes: 21 additions & 13 deletions src/__tests__/makeHomepage.test.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
import fetch from 'isomorphic-fetch'
import { makeHomepage, makeHomepageBundles, makeActivityPubBundles } from '../makeHomepage'
import {
makeHomepage,
makeHomepageBundles,
makeActivityPubBundles,
} from '../makeHomepage'
import { MOCK_HOMEPAGE } from '../render/mock'

jest.mock('isomorphic-fetch')
Expand All @@ -24,18 +28,22 @@ describe('makeHomepage', () => {
arrayBuffer: () => Promise.resolve(new ArrayBuffer(1)),
})

const bundles = await makeHomepageBundles(
MOCK_HOMEPAGE('matters.news')
)
const bundles = await makeHomepageBundles(MOCK_HOMEPAGE('matters.news'))

let html = ''
let xml = ''
let json = ''
let xml = ''
let json = ''
for (const { path, content } of bundles) {
switch (path) {
case 'index.html': html = content; break;
case 'rss.xml': xml = content; break;
case 'feed.json': json = content; break;
case 'index.html':
html = content
break
case 'rss.xml':
xml = content
break
case 'feed.json':
json = content
break
}
}
expect(html).toMatchSnapshot()
Expand All @@ -48,14 +56,14 @@ describe('makeHomepage', () => {
arrayBuffer: () => Promise.resolve(new ArrayBuffer(1)),
})

const bundles = await makeActivityPubBundles(
MOCK_HOMEPAGE('matters.news')
)
const bundles = await makeActivityPubBundles(MOCK_HOMEPAGE('matters.news'))

let webfinger: string = ''
for (const { path, content } of bundles) {
switch (path) {
case '.well-known/webfinger': webfinger = content; break;
case '.well-known/webfinger':
webfinger = content
break
}
}
expect(webfinger).toMatchSnapshot()
Expand Down
35 changes: 27 additions & 8 deletions src/utils/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,33 @@ export const cleanHTML = (html: string) => {
}

/**
* Strip html tags from html string to get text.
* Strip HTML tags from HTML string to get plain text.
* @param html - html string
* @param replacement - string to replace tags
* @param tagReplacement - string to replace tags
* @param lineReplacement - string to replace tags
*
* @see {@url https://github.com/thematters/ipns-site-generator/blob/main/src/utils/index.ts}
*/
export const stripHtml = (html: string, replacement = ' ') =>
(String(html) || '')
.replace(/(<\/p><p>|&nbsp;)/g, ' ') // replace line break and space first
.replace(/(<([^>]+)>)/gi, replacement)
export const stripHtml = (
html: string,
tagReplacement = '',
lineReplacement = '\n'
) => {
html = String(html) || ''

html = html.replace(/\&nbsp\;/g, ' ')

// Replace block-level elements with newlines
html = html.replace(/<(\/?p|\/?blockquote|br\/?)>/gi, lineReplacement)

// Remove remaining HTML tags
let plainText = html.replace(/<\/?[^>]+(>|$)/g, tagReplacement)

// Normalize multiple newlines and trim the result
plainText = plainText.replace(/\n\s*\n/g, '\n').trim()

return plainText
}

/**
* Return beginning of text in html as summary, split on sentence break within buffer range.
Expand All @@ -33,7 +52,7 @@ export const stripHtml = (html: string, replacement = ' ') =>
*/
export const makeSummary = (html: string, length = 140, buffer = 20) => {
// split on sentence breaks
const sections = stripHtml(html, '')
const sections = stripHtml(html, '', ' ')
.replace(/([?!。?!]|(\.\s))\s*/g, '$1|')
.split('|')

Expand All @@ -44,7 +63,7 @@ export const makeSummary = (html: string, length = 140, buffer = 20) => {

const addition =
el.length + summary.length > length + buffer
? `${el.substring(0, length - summary.length)}...`
? `${el.substring(0, length - summary.length)}`
: el

summary = summary.concat(addition)
Expand Down

0 comments on commit b566f6d

Please sign in to comment.