Skip to content

Commit

Permalink
added support for robots.txt exclusion in scraping
Browse files Browse the repository at this point in the history
  • Loading branch information
vtempest committed Oct 1, 2024
1 parent 0dfeac1 commit 72510a7
Show file tree
Hide file tree
Showing 28 changed files with 4,104 additions and 178 deletions.
1 change: 0 additions & 1 deletion CNAME

This file was deleted.

2 changes: 1 addition & 1 deletion docs/assets/search.js

Large diffs are not rendered by default.

19 changes: 12 additions & 7 deletions docs/classes/torch.html

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/functions/convertMarkdownToHtml.html
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<!DOCTYPE html><html class="default" lang="en"><head><meta charset="utf-8"/><meta http-equiv="x-ua-compatible" content="IE=edge"/><title>convertMarkdownToHtml | ai-research-agent</title><meta name="description" content="Documentation for ai-research-agent"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="stylesheet" href="../assets/style.css"/><link rel="stylesheet" href="../assets/highlight.css"/><script defer src="../assets/main.js"></script><script async src="../assets/icons.js" id="tsd-icons-script"></script><script async src="../assets/search.js" id="tsd-search-script"></script><script async src="../assets/navigation.js" id="tsd-nav-script"></script></head><body><script>console.log(`Loaded ${location.href}`)</script><script>document.documentElement.dataset.theme = localStorage.getItem("tsd-theme") || "os";document.body.style.display="none";setTimeout(() => app?app.showPage():document.body.style.removeProperty("display"),500)</script><header class="tsd-page-toolbar"><div class="tsd-toolbar-contents container"><div class="table-cell" id="tsd-search" data-base=".."><div class="field"><label for="tsd-search-field" class="tsd-widget tsd-toolbar-icon search no-caption"><svg width="16" height="16" viewBox="0 0 16 16" fill="none"><use href="../assets/icons.svg#icon-search"></use></svg></label><input type="text" id="tsd-search-field" aria-label="Search"/></div><div class="field"><div id="tsd-toolbar-links"><a href="https://github.com/vtempest/ai-research-agent">Source Code</a><a href="https://qwksearch.com">Live Demo</a><a href="https://discord.gg/SJdBqBz3tV">Discord Chat</a></div></div><ul class="results"><li class="state loading">Preparing search index...</li><li class="state failure">The search index is not available</li></ul><a href="../index.html" class="title">ai-research-agent</a></div><div class="table-cell" id="tsd-widgets"><a href="#" class="tsd-widget tsd-toolbar-icon menu no-caption" data-toggle="menu" aria-label="Menu"><svg width="16" height="16" viewBox="0 0 16 16" fill="none"><use href="../assets/icons.svg#icon-menu"></use></svg></a></div></div></header><div class="container container-main"><div class="col-content"><div class="tsd-page-title"><ul class="tsd-breadcrumb"><li><a href="../modules.html">ai-research-agent</a></li><li><a href="convertMarkdownToHtml.html">convertMarkdownToHtml</a></li></ul><h1>Function convertMarkdownToHtml</h1></div><section class="tsd-panel"><ul class="tsd-signatures"><li class="tsd-signature tsd-anchor-link"><a id="convertMarkdownToHtml" class="tsd-anchor"></a><span class="tsd-kind-call-signature">convert<wbr/>Markdown<wbr/>To<wbr/>Html</span><span class="tsd-signature-symbol">(</span><span class="tsd-kind-parameter">markdown</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><a href="#convertMarkdownToHtml" aria-label="Permalink" class="tsd-anchor-icon"><svg viewBox="0 0 24 24"><use href="../assets/icons.svg#icon-anchor"></use></svg></a></li><li class="tsd-description"><div class="tsd-comment tsd-typography"><p>Converts Markdown text to HTML. It handles the following Markdown elements:</p>
<!DOCTYPE html><html class="default" lang="en"><head><meta charset="utf-8"/><meta http-equiv="x-ua-compatible" content="IE=edge"/><title>convertMarkdownToHTML | ai-research-agent</title><meta name="description" content="Documentation for ai-research-agent"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="stylesheet" href="../assets/style.css"/><link rel="stylesheet" href="../assets/highlight.css"/><script defer src="../assets/main.js"></script><script async src="../assets/icons.js" id="tsd-icons-script"></script><script async src="../assets/search.js" id="tsd-search-script"></script><script async src="../assets/navigation.js" id="tsd-nav-script"></script></head><body><script>console.log(`Loaded ${location.href}`)</script><script>document.documentElement.dataset.theme = localStorage.getItem("tsd-theme") || "os";document.body.style.display="none";setTimeout(() => app?app.showPage():document.body.style.removeProperty("display"),500)</script><header class="tsd-page-toolbar"><div class="tsd-toolbar-contents container"><div class="table-cell" id="tsd-search" data-base=".."><div class="field"><label for="tsd-search-field" class="tsd-widget tsd-toolbar-icon search no-caption"><svg width="16" height="16" viewBox="0 0 16 16" fill="none"><use href="../assets/icons.svg#icon-search"></use></svg></label><input type="text" id="tsd-search-field" aria-label="Search"/></div><div class="field"><div id="tsd-toolbar-links"><a href="https://github.com/vtempest/ai-research-agent">Source Code</a><a href="https://qwksearch.com">Live Demo</a><a href="https://discord.gg/SJdBqBz3tV">Discord Chat</a></div></div><ul class="results"><li class="state loading">Preparing search index...</li><li class="state failure">The search index is not available</li></ul><a href="../index.html" class="title">ai-research-agent</a></div><div class="table-cell" id="tsd-widgets"><a href="#" class="tsd-widget tsd-toolbar-icon menu no-caption" data-toggle="menu" aria-label="Menu"><svg width="16" height="16" viewBox="0 0 16 16" fill="none"><use href="../assets/icons.svg#icon-menu"></use></svg></a></div></div></header><div class="container container-main"><div class="col-content"><div class="tsd-page-title"><ul class="tsd-breadcrumb"><li><a href="../modules.html">ai-research-agent</a></li><li><a href="convertMarkdownToHTML.html">convertMarkdownToHTML</a></li></ul><h1>Function convertMarkdownToHTML</h1></div><section class="tsd-panel"><ul class="tsd-signatures"><li class="tsd-signature tsd-anchor-link"><a id="convertMarkdownToHTML" class="tsd-anchor"></a><span class="tsd-kind-call-signature">convert<wbr/>Markdown<wbr/>To<wbr/>Html</span><span class="tsd-signature-symbol">(</span><span class="tsd-kind-parameter">markdown</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><a href="#convertMarkdownToHTML" aria-label="Permalink" class="tsd-anchor-icon"><svg viewBox="0 0 24 24"><use href="../assets/icons.svg#icon-anchor"></use></svg></a></li><li class="tsd-description"><div class="tsd-comment tsd-typography"><p>Converts Markdown text to HTML. It handles the following Markdown elements:</p>
<ul>
<li>Headers (h1 to h6)</li>
<li>Bold text</li>
Expand All @@ -11,7 +11,7 @@
</ul>
</div><div class="tsd-parameters"><h4 class="tsd-parameters-title">Parameters</h4><ul class="tsd-parameter-list"><li><span><span class="tsd-kind-parameter">markdown</span>: <span class="tsd-signature-type">string</span></span><div class="tsd-comment tsd-typography"><p>The Markdown-formatted text to be converted.</p>
</div><div class="tsd-comment tsd-typography"></div></li></ul></div><h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4><p>The resulting HTML string.</p>
<div class="tsd-comment tsd-typography"><h4 class="tsd-anchor-link"><a id="Example" class="tsd-anchor"></a>Example<a href="#Example" aria-label="Permalink" class="tsd-anchor-icon"><svg viewBox="0 0 24 24"><use href="../assets/icons.svg#icon-anchor"></use></svg></a></h4><pre><code class="ts"><span class="hl-0">const</span><span class="hl-1"> </span><span class="hl-2">markdown</span><span class="hl-1"> = </span><span class="hl-5">&quot;# Header</span><span class="hl-8">\n\n</span><span class="hl-5">This is **bold** and *italic* text.</span><span class="hl-8">\n\n</span><span class="hl-5">* List item 1</span><span class="hl-8">\n</span><span class="hl-5">* List item 2&quot;</span><span class="hl-1">;</span><br/><span class="hl-0">const</span><span class="hl-1"> </span><span class="hl-2">html</span><span class="hl-1"> = </span><span class="hl-4">convertMarkdownToHtml</span><span class="hl-1">(</span><span class="hl-6">markdown</span><span class="hl-1">);</span><br/><span class="hl-6">console</span><span class="hl-1">.</span><span class="hl-4">log</span><span class="hl-1">(</span><span class="hl-6">html</span><span class="hl-1">);</span><br/><span class="hl-7">// Output:</span><br/><span class="hl-7">// &lt;h1&gt;Header&lt;/h1&gt;</span><br/><span class="hl-7">// &lt;p&gt;This is &lt;strong&gt;bold&lt;/strong&gt; and &lt;em&gt;italic&lt;/em&gt; text.&lt;/p&gt;</span><br/><span class="hl-7">// &lt;ul&gt;&lt;li&gt;List item 1&lt;/li&gt;&lt;li&gt;List item 2&lt;/li&gt;&lt;/ul&gt;</span>
<div class="tsd-comment tsd-typography"><h4 class="tsd-anchor-link"><a id="Example" class="tsd-anchor"></a>Example<a href="#Example" aria-label="Permalink" class="tsd-anchor-icon"><svg viewBox="0 0 24 24"><use href="../assets/icons.svg#icon-anchor"></use></svg></a></h4><pre><code class="ts"><span class="hl-0">const</span><span class="hl-1"> </span><span class="hl-2">markdown</span><span class="hl-1"> = </span><span class="hl-5">&quot;# Header</span><span class="hl-8">\n\n</span><span class="hl-5">This is **bold** and *italic* text.</span><span class="hl-8">\n\n</span><span class="hl-5">* List item 1</span><span class="hl-8">\n</span><span class="hl-5">* List item 2&quot;</span><span class="hl-1">;</span><br/><span class="hl-0">const</span><span class="hl-1"> </span><span class="hl-2">html</span><span class="hl-1"> = </span><span class="hl-4">convertMarkdownToHTML</span><span class="hl-1">(</span><span class="hl-6">markdown</span><span class="hl-1">);</span><br/><span class="hl-6">console</span><span class="hl-1">.</span><span class="hl-4">log</span><span class="hl-1">(</span><span class="hl-6">html</span><span class="hl-1">);</span><br/><span class="hl-7">// Output:</span><br/><span class="hl-7">// &lt;h1&gt;Header&lt;/h1&gt;</span><br/><span class="hl-7">// &lt;p&gt;This is &lt;strong&gt;bold&lt;/strong&gt; and &lt;em&gt;italic&lt;/em&gt; text.&lt;/p&gt;</span><br/><span class="hl-7">// &lt;ul&gt;&lt;li&gt;List item 1&lt;/li&gt;&lt;li&gt;List item 2&lt;/li&gt;&lt;/ul&gt;</span>
</code><button type="button">Copy</button></pre>

</div><aside class="tsd-sources"><ul><li>Defined in <a href="https://github.com/vtempest/ai-research-agent/tree/master/src/extractor/html-to-content/html-utils.js#L197">extractor/html-to-content/html-utils.js:197</a></li></ul></aside></li></ul></section></div><div class="col-sidebar"><div class="page-menu"><div class="tsd-navigation settings"><details class="tsd-accordion"><summary class="tsd-accordion-summary"><h3><svg width="20" height="20" viewBox="0 0 24 24" fill="none"><use href="../assets/icons.svg#icon-chevronDown"></use></svg>Settings</h3></summary><div class="tsd-accordion-details"><div class="tsd-filter-visibility"><span class="settings-label">Member Visibility</span><ul id="tsd-filter-options"><li class="tsd-filter-item"><label class="tsd-filter-input"><input type="checkbox" id="tsd-filter-protected" name="protected"/><svg width="32" height="32" viewBox="0 0 32 32" aria-hidden="true"><rect class="tsd-checkbox-background" width="30" height="30" x="1" y="1" rx="6" fill="none"></rect><path class="tsd-checkbox-checkmark" d="M8.35422 16.8214L13.2143 21.75L24.6458 10.25" stroke="none" stroke-width="3.5" stroke-linejoin="round" fill="none"></path></svg><span>Protected</span></label></li><li class="tsd-filter-item"><label class="tsd-filter-input"><input type="checkbox" id="tsd-filter-inherited" name="inherited" checked/><svg width="32" height="32" viewBox="0 0 32 32" aria-hidden="true"><rect class="tsd-checkbox-background" width="30" height="30" x="1" y="1" rx="6" fill="none"></rect><path class="tsd-checkbox-checkmark" d="M8.35422 16.8214L13.2143 21.75L24.6458 10.25" stroke="none" stroke-width="3.5" stroke-linejoin="round" fill="none"></path></svg><span>Inherited</span></label></li><li class="tsd-filter-item"><label class="tsd-filter-input"><input type="checkbox" id="tsd-filter-external" name="external"/><svg width="32" height="32" viewBox="0 0 32 32" aria-hidden="true"><rect class="tsd-checkbox-background" width="30" height="30" x="1" y="1" rx="6" fill="none"></rect><path class="tsd-checkbox-checkmark" d="M8.35422 16.8214L13.2143 21.75L24.6458 10.25" stroke="none" stroke-width="3.5" stroke-linejoin="round" fill="none"></path></svg><span>External</span></label></li></ul></div><div class="tsd-theme-toggle"><label class="settings-label" for="tsd-theme">Theme</label><select id="tsd-theme"><option value="os">OS</option><option value="light">Light</option><option value="dark">Dark</option></select></div></div></details></div></div><div class="site-menu"><nav id="tsd-sidebar-links" class="tsd-navigation"><a href="https://github.com/vtempest/ai-research-agent">Source Code</a><a href="https://qwksearch.com">Live Demo</a><a href="https://discord.gg/SJdBqBz3tV">Discord Chat</a><a href="https://github.com/vtempest/ai-research-agent" class="tsd-nav-link">Source Code</a><a href="https://qwksearch.com" class="tsd-nav-link">Live Demo</a><a href="https://discord.gg/SJdBqBz3tV" class="tsd-nav-link">Discord Chat</a></nav><nav class="tsd-navigation"><a href="../modules.html"><svg class="tsd-kind-icon" viewBox="0 0 24 24"><use href="../assets/icons.svg#icon-1"></use></svg><span>ai-research-agent</span></a><ul class="tsd-small-nested-navigation" id="tsd-nav-container" data-base=".."><li>Loading...</li></ul></nav></div></div></div><footer></footer><div class="overlay"></div><script async src="https://www.googletagmanager.com/gtag/js?id=G-E5TZ32BZDF"></script><script>window.dataLayer = window.dataLayer || [];
Expand Down
Loading

0 comments on commit 72510a7

Please sign in to comment.