-
-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New puppeteer examples #1355
New puppeteer examples #1355
Conversation
|
Warning Rate limit exceeded@samejr has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 10 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe pull request introduces comprehensive documentation updates for integrating Puppeteer within Trigger.dev projects. It adds new sections detailing the configuration of Puppeteer in the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (6)
docs/snippets/web-scraping-warning.mdx (1)
1-3
: Approved: Clear and important warning message.The warning message is well-crafted and effectively communicates the company's policy on web scraping. It aligns with the PR objectives and provides crucial information for users.
To enhance readability, consider breaking the content into bullet points:
<Warning> - **WEB SCRAPING WARNING:** Direct scraping of third-party websites without explicit permission using Trigger.dev Cloud is strictly prohibited and will result in immediate account suspension. If web scraping is necessary for your project, you MUST use a proxy service to comply with our terms of service. + **WEB SCRAPING WARNING:** + - Direct scraping of third-party websites without explicit permission using Trigger.dev Cloud is strictly prohibited. + - Violation will result in immediate account suspension. + - If web scraping is necessary for your project, you MUST use a proxy service to comply with our terms of service. </Warning>This format makes the key points more scannable and easier to digest.
docs/examples/puppeteer.mdx (4)
20-35
: Add import statement forpuppeteer()
functionThe code snippet uses the
puppeteer()
function in the build configuration, but it's not imported. To improve clarity and prevent potential errors, consider adding the import statement at the beginning of the file.Add the following import statement at the beginning of the
trigger.config.ts
file:import { defineConfig } from "@trigger.dev/sdk/v3"; +import { puppeteer } from "@trigger.dev/sdk/v3";
37-43
: Add a note about system-dependent pathsThe
PUPPETEER_EXECUTABLE_PATH
environment variable is correctly set, but the provided path is specific to a Linux environment. To improve clarity for users on different operating systems, consider adding a note about system-dependent paths.Add the following note after the environment variable example:
Note: The path to the Chrome executable may vary depending on your operating system and installation method. Adjust the path accordingly for your specific environment.
202-213
: LGTM: Comprehensive proxying informationThe proxying section provides crucial information about the necessity of using proxies for web scraping and offers a helpful list of recommended services.
Consider adding a brief explanation of why each proxy service is recommended or what specific features they offer. This could help users make a more informed decision when choosing a proxy service.
🧰 Tools
LanguageTool
[uncategorized] ~204-~204: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...(COMMA_COMPOUND_SENTENCE)
[grammar] ~206-~206: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...(THERE_IS_ARE)
206-206
: Fix verb agreementThere's a minor grammatical issue in the introduction to the list of proxy services.
Change the sentence to:
- Here are a list of proxy services we recommend: + Here is a list of proxy services we recommend:This correction ensures proper verb agreement with the singular noun "list".
🧰 Tools
LanguageTool
[grammar] ~206-~206: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...(THERE_IS_ARE)
docs/config/config-file.mdx (1)
477-503
: LGTM! Clear and comprehensive Puppeteer configuration guide.The new Puppeteer configuration section is well-structured and provides clear instructions. It's great that you've included:
- A warning about web scraping practices.
- A code example for the
trigger.config.ts
file.- Instructions for setting up the necessary environment variable.
- A note about using
puppeteer
instead ofpuppeteer-core
.Consider adding a brief explanation of why the
PUPPETEER_EXECUTABLE_PATH
environment variable is needed. This could help users understand the purpose of this configuration step.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
- docs/config/config-file.mdx (3 hunks)
- docs/examples/intro.mdx (1 hunks)
- docs/examples/puppeteer.mdx (1 hunks)
- docs/mint.json (1 hunks)
- docs/snippets/web-scraping-warning.mdx (1 hunks)
🧰 Additional context used
LanguageTool
docs/examples/puppeteer.mdx
[typographical] ~49-~49: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer to log out the title o...(DURING_THAT_TIME_COMMA)
[typographical] ~81-~81: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer to generate a PDF from...(DURING_THAT_TIME_COMMA)
[typographical] ~144-~144: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer with a BrowserBase pro...(DURING_THAT_TIME_COMMA)
[uncategorized] ~204-~204: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...(COMMA_COMPOUND_SENTENCE)
[grammar] ~206-~206: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...(THERE_IS_ARE)
🔇 Additional comments not posted (7)
docs/examples/intro.mdx (1)
14-14
: New Puppeteer example added correctly.The new entry for Puppeteer has been added to the table with proper formatting and positioning. The description is concise and accurately represents the functionality of Puppeteer. The link follows the same pattern as other entries, maintaining consistency.
However, to ensure completeness:
Let's verify if the linked page exists:
This will help confirm that the linked page has been created and is in the correct location.
✅ Verification successful
Puppeteer example link verified successfully.
The linked page
docs/examples/puppeteer.mdx
exists, ensuring that the Puppeteer entry in the table points to a valid resource.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if the Puppeteer example page exists fd -t f "puppeteer.mdx" docs/examplesLength of output: 65
docs/examples/puppeteer.mdx (2)
1-19
: LGTM: Well-structured introduction and overviewThe file header and overview section provide clear and concise information about the document's content. The inclusion of a web scraping warning is a responsible practice.
1-213
: Overall: Excellent documentation with minor improvements suggestedThis document provides comprehensive and well-structured guidance on using Puppeteer with Trigger.dev. The examples cover a range of use cases and include important information about build configurations, environment setup, and proxying.
The suggested improvements primarily focus on:
- Enhancing code robustness and error handling.
- Improving compatibility across different environments.
- Refining content extraction logic.
- Minor grammatical and clarity enhancements.
These changes will further elevate the quality of this already valuable resource for developers integrating Puppeteer with Trigger.dev.
🧰 Tools
LanguageTool
[typographical] ~49-~49: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer to log out the title o...(DURING_THAT_TIME_COMMA)
[typographical] ~81-~81: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer to generate a PDF from...(DURING_THAT_TIME_COMMA)
[typographical] ~144-~144: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer with a BrowserBase pro...(DURING_THAT_TIME_COMMA)
[uncategorized] ~204-~204: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...(COMMA_COMPOUND_SENTENCE)
[grammar] ~206-~206: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...(THERE_IS_ARE)
docs/mint.json (1)
285-285
: LGTM: New Puppeteer example correctly addedThe addition of "examples/puppeteer" to the list of examples in the navigation section is correct and aligns with the PR objectives. It's appropriately placed among other technology-specific examples.
A few points to note:
- The new entry is correctly formatted and consistent with other entries in the list.
- Its position in the list seems logical, grouped with other technology-specific examples.
- This change supports the PR's goal of introducing new Puppeteer examples to the documentation.
To ensure the new example page exists, please run the following command:
If the command doesn't return a result, please ensure that the corresponding Puppeteer example page has been created in the
docs/examples
directory.✅ Verification successful
Verified: Puppeteer example page exists
The Puppeteer example page
docs/examples/puppeteer.mdx
has been successfully added and exists in the documentation examples directory.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the existence of the Puppeteer example page fd -t f "puppeteer.mdx" docs/examplesLength of output: 65
docs/config/config-file.mdx (3)
7-7
: New import for ScrapingWarning component.The import of the
ScrapingWarning
component is correctly added to support the new Puppeteer section.
514-514
: Minor update to ffmpeg section comment.The comment in the ffmpeg section has been updated to be more generic, which is a good improvement for maintainability.
Line range hint
1-646
: Overall, excellent updates to the configuration documentation.The addition of the Puppeteer section and the minor updates to other parts of the document improve the overall quality and completeness of the configuration guide. The changes are well-integrated and provide clear, actionable information for users.
docs/examples/puppeteer.mdx
Outdated
## Basic example | ||
|
||
### Overview | ||
|
||
In this example we use Puppeteer to log out the title of a web page, in this case Google. | ||
|
||
### Task code | ||
|
||
```ts trigger/puppeteer-basic-example.ts | ||
import { logger, task } from "@trigger.dev/sdk/v3"; | ||
import puppeteer from "puppeteer"; | ||
|
||
export const puppeteerTask = task({ | ||
id: "puppeteer-log-title", | ||
run: async () => { | ||
const browser = await puppeteer.launch(); | ||
const page = await browser.newPage(); | ||
|
||
await page.goto("https://google.com"); | ||
|
||
const content = await page.title(); | ||
logger.info("Content", { content }); | ||
|
||
await browser.close(); | ||
}, | ||
}); | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding launch options for better compatibility
The basic example demonstrates Puppeteer usage correctly. However, to ensure better compatibility across different environments, consider adding launch options to the puppeteer.launch()
call.
Modify the puppeteer.launch()
call as follows:
- const browser = await puppeteer.launch();
+ const browser = await puppeteer.launch({
+ headless: "new",
+ args: ['--no-sandbox', '--disable-setuid-sandbox']
+ });
This change ensures compatibility with newer versions of Puppeteer and improves stability in various environments, including containerized ones.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
## Basic example | |
### Overview | |
In this example we use Puppeteer to log out the title of a web page, in this case Google. | |
### Task code | |
```ts trigger/puppeteer-basic-example.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer"; | |
export const puppeteerTask = task({ | |
id: "puppeteer-log-title", | |
run: async () => { | |
const browser = await puppeteer.launch(); | |
const page = await browser.newPage(); | |
await page.goto("https://google.com"); | |
const content = await page.title(); | |
logger.info("Content", { content }); | |
await browser.close(); | |
}, | |
}); | |
``` | |
## Basic example | |
### Overview | |
In this example we use Puppeteer to log out the title of a web page, in this case Google. | |
### Task code | |
```ts trigger/puppeteer-basic-example.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer"; | |
export const puppeteerTask = task({ | |
id: "puppeteer-log-title", | |
run: async () => { | |
const browser = await puppeteer.launch({ | |
headless: "new", | |
args: ['--no-sandbox', '--disable-setuid-sandbox'] | |
}); | |
const page = await browser.newPage(); | |
await page.goto("https://google.com"); | |
const content = await page.title(); | |
logger.info("Content", { content }); | |
await browser.close(); | |
}, | |
}); | |
``` |
🧰 Tools
LanguageTool
[typographical] ~49-~49: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer to log out the title o...(DURING_THAT_TIME_COMMA)
docs/examples/puppeteer.mdx
Outdated
## Scrape content from a web page | ||
|
||
### Overview | ||
|
||
In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. | ||
|
||
<ScrapingWarning /> | ||
|
||
### Task code | ||
|
||
```ts trigger/scrape-website.ts | ||
import { logger, task } from "@trigger.dev/sdk/v3"; | ||
import puppeteer from "puppeteer-core"; | ||
|
||
export const puppeteerScrapeWithProxy = task({ | ||
id: "puppeteer-scrape-with-proxy", | ||
run: async () => { | ||
const browser = await puppeteer.connect({ | ||
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`, | ||
}); | ||
|
||
const page = await browser.newPage(); | ||
|
||
// Set up BrowserBase proxy authentication | ||
await page.authenticate({ | ||
username: "api", | ||
password: process.env.BROWSERBASE_API_KEY || "", | ||
}); | ||
|
||
try { | ||
// Navigate to the target website | ||
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" }); | ||
|
||
// Scrape the GitHub stars count | ||
const starCount = await page.evaluate(() => { | ||
const starElement = document.querySelector(".github-star-count"); | ||
const text = starElement?.textContent ?? "0"; | ||
const numberText = text.replace(/[^0-9]/g, ""); | ||
return parseInt(numberText); | ||
}); | ||
|
||
logger.info("GitHub star count", { starCount }); | ||
|
||
return { starCount }; | ||
} catch (error) { | ||
logger.error("Error during scraping", { | ||
error: error instanceof Error ? error.message : String(error), | ||
}); | ||
throw error; | ||
} finally { | ||
await browser.close(); | ||
} | ||
}, | ||
}); | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve star count extraction logic
The web scraping example is well-structured with good error handling. However, the star count extraction logic could be more robust.
Consider modifying the star count extraction logic to handle potential formatting variations:
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
- const text = starElement?.textContent ?? "0";
- const numberText = text.replace(/[^0-9]/g, "");
- return parseInt(numberText);
+ const text = starElement?.textContent?.trim() ?? "0";
+ const match = text.match(/^([\d,]+)/);
+ return match ? parseInt(match[1].replace(/,/g, '')) : 0;
});
This change improves the extraction logic by:
- Trimming whitespace from the text content.
- Using a regex to match the first group of digits (including commas).
- Removing commas before parsing the integer.
- Returning 0 if no match is found.
These modifications make the extraction more resilient to different formatting styles of the star count.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
## Scrape content from a web page | |
### Overview | |
In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. | |
<ScrapingWarning /> | |
### Task code | |
```ts trigger/scrape-website.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer-core"; | |
export const puppeteerScrapeWithProxy = task({ | |
id: "puppeteer-scrape-with-proxy", | |
run: async () => { | |
const browser = await puppeteer.connect({ | |
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`, | |
}); | |
const page = await browser.newPage(); | |
// Set up BrowserBase proxy authentication | |
await page.authenticate({ | |
username: "api", | |
password: process.env.BROWSERBASE_API_KEY || "", | |
}); | |
try { | |
// Navigate to the target website | |
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" }); | |
// Scrape the GitHub stars count | |
const starCount = await page.evaluate(() => { | |
const starElement = document.querySelector(".github-star-count"); | |
const text = starElement?.textContent ?? "0"; | |
const numberText = text.replace(/[^0-9]/g, ""); | |
return parseInt(numberText); | |
}); | |
logger.info("GitHub star count", { starCount }); | |
return { starCount }; | |
} catch (error) { | |
logger.error("Error during scraping", { | |
error: error instanceof Error ? error.message : String(error), | |
}); | |
throw error; | |
} finally { | |
await browser.close(); | |
} | |
}, | |
}); | |
``` | |
## Scrape content from a web page | |
### Overview | |
In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. | |
<ScrapingWarning /> | |
### Task code | |
```ts trigger/scrape-website.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer-core"; | |
export const puppeteerScrapeWithProxy = task({ | |
id: "puppeteer-scrape-with-proxy", | |
run: async () => { | |
const browser = await puppeteer.connect({ | |
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`, | |
}); | |
const page = await browser.newPage(); | |
// Set up BrowserBase proxy authentication | |
await page.authenticate({ | |
username: "api", | |
password: process.env.BROWSERBASE_API_KEY || "", | |
}); | |
try { | |
// Navigate to the target website | |
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" }); | |
// Scrape the GitHub stars count | |
const starCount = await page.evaluate(() => { | |
const starElement = document.querySelector(".github-star-count"); | |
const text = starElement?.textContent?.trim() ?? "0"; | |
const match = text.match(/^([\d,]+)/); | |
return match ? parseInt(match[1].replace(/,/g, '')) : 0; | |
}); | |
logger.info("GitHub star count", { starCount }); | |
return { starCount }; | |
} catch (error) { | |
logger.error("Error during scraping", { | |
error: error instanceof Error ? error.message : String(error), | |
}); | |
throw error; | |
} finally { | |
await browser.close(); | |
} | |
}, | |
}); | |
``` |
🧰 Tools
LanguageTool
[typographical] ~144-~144: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer with a BrowserBase pro...(DURING_THAT_TIME_COMMA)
docs/examples/puppeteer.mdx
Outdated
## Generate a PDF from a web page | ||
|
||
### Overview | ||
|
||
In this example we use Puppeteer to generate a PDF from a web page and upload it to Cloudflare R2. | ||
|
||
### Task code | ||
|
||
```ts trigger/puppeteer-generate-pdf.ts | ||
import { logger, task } from "@trigger.dev/sdk/v3"; | ||
import puppeteer from "puppeteer"; | ||
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"; | ||
|
||
// Initialize S3 client | ||
const s3Client = new S3Client({ | ||
region: "auto", | ||
endpoint: process.env.S3_ENDPOINT, | ||
credentials: { | ||
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "", | ||
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "", | ||
}, | ||
}); | ||
|
||
export const puppeteerWebpageToPDF = task({ | ||
id: "puppeteer-webpage-to-pdf", | ||
run: async () => { | ||
const browser = await puppeteer.launch(); | ||
const page = await browser.newPage(); | ||
const response = await page.goto("https://google.com"); | ||
const url = response?.url() ?? "No URL found"; | ||
|
||
// Generate PDF from the web page | ||
const generatePdf = await page.pdf(); | ||
|
||
logger.info("PDF generated from URL", { url }); | ||
|
||
await browser.close(); | ||
|
||
// Upload to R2 | ||
const s3Key = `pdfs/test.pdf`; | ||
const uploadParams = { | ||
Bucket: process.env.S3_BUCKET, | ||
Key: s3Key, | ||
Body: generatePdf, | ||
ContentType: "application/pdf", | ||
}; | ||
|
||
logger.log("Uploading to R2 with params", uploadParams); | ||
|
||
// Upload the PDF to R2 and return the URL. | ||
await s3Client.send(new PutObjectCommand(uploadParams)); | ||
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`; | ||
logger.log("PDF uploaded to R2", { url: s3Url }); | ||
return { pdfUrl: s3Url }; | ||
}, | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance error handling and resource management
The PDF generation example is well-structured but could benefit from improved error handling and resource management.
Consider the following improvements:
- Wrap the main logic in a try-catch block to handle potential errors.
- Ensure the browser is closed even if an error occurs.
- Use environment variables more safely with fallback values.
Here's a suggested refactor:
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
+ let browser;
+ try {
- const browser = await puppeteer.launch();
+ browser = await puppeteer.launch({
+ headless: "new",
+ args: ['--no-sandbox', '--disable-setuid-sandbox']
+ });
const page = await browser.newPage();
const response = await page.goto("https://google.com");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
- await browser.close();
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
- Bucket: process.env.S3_BUCKET,
+ Bucket: process.env.S3_BUCKET ?? '',
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
- const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
+ const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
+ } catch (error) {
+ logger.error("Error in puppeteerWebpageToPDF", { error });
+ throw error;
+ } finally {
+ if (browser) {
+ await browser.close();
+ }
+ }
},
});
These changes improve error handling, ensure proper resource cleanup, and make the code more robust against potential issues with environment variables.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
## Generate a PDF from a web page | |
### Overview | |
In this example we use Puppeteer to generate a PDF from a web page and upload it to Cloudflare R2. | |
### Task code | |
```ts trigger/puppeteer-generate-pdf.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer"; | |
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"; | |
// Initialize S3 client | |
const s3Client = new S3Client({ | |
region: "auto", | |
endpoint: process.env.S3_ENDPOINT, | |
credentials: { | |
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "", | |
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "", | |
}, | |
}); | |
export const puppeteerWebpageToPDF = task({ | |
id: "puppeteer-webpage-to-pdf", | |
run: async () => { | |
const browser = await puppeteer.launch(); | |
const page = await browser.newPage(); | |
const response = await page.goto("https://google.com"); | |
const url = response?.url() ?? "No URL found"; | |
// Generate PDF from the web page | |
const generatePdf = await page.pdf(); | |
logger.info("PDF generated from URL", { url }); | |
await browser.close(); | |
// Upload to R2 | |
const s3Key = `pdfs/test.pdf`; | |
const uploadParams = { | |
Bucket: process.env.S3_BUCKET, | |
Key: s3Key, | |
Body: generatePdf, | |
ContentType: "application/pdf", | |
}; | |
logger.log("Uploading to R2 with params", uploadParams); | |
// Upload the PDF to R2 and return the URL. | |
await s3Client.send(new PutObjectCommand(uploadParams)); | |
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`; | |
logger.log("PDF uploaded to R2", { url: s3Url }); | |
return { pdfUrl: s3Url }; | |
}, | |
}); | |
export const puppeteerWebpageToPDF = task({ | |
id: "puppeteer-webpage-to-pdf", | |
run: async () => { | |
let browser; | |
try { | |
browser = await puppeteer.launch({ | |
headless: "new", | |
args: ['--no-sandbox', '--disable-setuid-sandbox'] | |
}); | |
const page = await browser.newPage(); | |
const response = await page.goto("https://google.com"); | |
const url = response?.url() ?? "No URL found"; | |
// Generate PDF from the web page | |
const generatePdf = await page.pdf(); | |
logger.info("PDF generated from URL", { url }); | |
// Upload to R2 | |
const s3Key = `pdfs/test.pdf`; | |
const uploadParams = { | |
Bucket: process.env.S3_BUCKET ?? '', | |
Key: s3Key, | |
Body: generatePdf, | |
ContentType: "application/pdf", | |
}; | |
logger.log("Uploading to R2 with params", uploadParams); | |
// Upload the PDF to R2 and return the URL. | |
await s3Client.send(new PutObjectCommand(uploadParams)); | |
const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`; | |
logger.log("PDF uploaded to R2", { url: s3Url }); | |
return { pdfUrl: s3Url }; | |
} catch (error) { | |
logger.error("Error in puppeteerWebpageToPDF", { error }); | |
throw error; | |
} finally { | |
if (browser) { | |
await browser.close(); | |
} | |
} | |
}, | |
}); |
🧰 Tools
LanguageTool
[typographical] ~81-~81: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer to generate a PDF from...(DURING_THAT_TIME_COMMA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
docs/examples/puppeteer.mdx (2)
1-46
: LGTM! Consider adding a brief explanation for the environment variable.The introduction and build configuration sections are well-structured and provide clear instructions. However, it might be helpful to briefly explain why the
PUPPETEER_EXECUTABLE_PATH
environment variable is needed.Consider adding a short explanation after line 45, such as:
This environment variable tells Puppeteer where to find the Chrome executable, which is necessary for it to function correctly in the Trigger.dev environment.
206-217
: LGTM! Consider minor grammatical improvementsThe proxying section effectively emphasizes the importance of using proxies for web scraping and provides a helpful list of recommended services. This is crucial information for users to comply with terms of service.
Consider the following minor grammatical improvements:
- Adjust the verb agreement in the introduction to the list:
-Here are a list of proxy services we recommend: +Here is a list of proxy services we recommend:
- Add a comma before "and" in the warning sentence:
-If you don't you'll risk getting our IP address blocked and we will ban you from our service. +If you don't, you'll risk getting our IP address blocked, and we will ban you from our service.These small changes will enhance the readability and grammatical correctness of the documentation.
🧰 Tools
LanguageTool
[uncategorized] ~208-~208: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...(COMMA_COMPOUND_SENTENCE)
[grammar] ~210-~210: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...(THERE_IS_ARE)
docs/config/config-file.mdx (1)
477-501
: LGTM! Clear and concise Puppeteer integration instructions.The new section on Puppeteer integration is well-structured and provides essential information:
- Includes an important warning about web scraping ethics.
- Clearly explains how to add Puppeteer to the build configuration.
- Specifies the necessary environment variable.
- Refers to an example for further guidance.
Consider adding a brief explanation of why the
PUPPETEER_EXECUTABLE_PATH
environment variable is needed. This could help users understand its importance. For example:PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable", +# This path ensures Puppeteer can locate the Chrome browser in the deployment environment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- docs/config/config-file.mdx (4 hunks)
- docs/examples/puppeteer.mdx (1 hunks)
- docs/snippets/web-scraping-warning.mdx (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/snippets/web-scraping-warning.mdx
🧰 Additional context used
LanguageTool
docs/examples/puppeteer.mdx
[typographical] ~51-~51: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer t...(DURING_THAT_TIME_COMMA)
[typographical] ~83-~83: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer t...(DURING_THAT_TIME_COMMA)
[typographical] ~146-~146: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer w...(DURING_THAT_TIME_COMMA)
[uncategorized] ~208-~208: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...(COMMA_COMPOUND_SENTENCE)
[grammar] ~210-~210: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...(THERE_IS_ARE)
🔇 Additional comments not posted (2)
docs/config/config-file.mdx (2)
535-536
: Good addition of FFmpeg example reference.The inclusion of a link to the FFmpeg example is consistent with the documentation style and provides valuable resources for users.
Line range hint
1-651
: Overall, excellent documentation updates.The changes to this file effectively integrate Puppeteer configuration instructions and enhance the FFmpeg section. The additions are well-structured, consistent with the existing documentation style, and provide valuable information for users.
## Basic example | ||
|
||
### Overview | ||
|
||
In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page. | ||
|
||
### Task code | ||
|
||
```ts trigger/puppeteer-basic-example.ts | ||
import { logger, task } from "@trigger.dev/sdk/v3"; | ||
import puppeteer from "puppeteer"; | ||
|
||
export const puppeteerTask = task({ | ||
id: "puppeteer-log-title", | ||
run: async () => { | ||
const browser = await puppeteer.launch(); | ||
const page = await browser.newPage(); | ||
|
||
await page.goto("https://trigger.dev"); | ||
|
||
const content = await page.title(); | ||
logger.info("Content", { content }); | ||
|
||
await browser.close(); | ||
}, | ||
}); | ||
``` | ||
|
||
### Testing your task | ||
|
||
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve compatibility and grammar in the basic example
The basic example is clear and functional. However, there are two improvements we can make:
- Add launch options for better compatibility across different environments.
- Fix a minor grammatical issue in the overview.
Please apply the following changes:
- Modify the
puppeteer.launch()
call in the code snippet:
- const browser = await puppeteer.launch();
+ const browser = await puppeteer.launch({
+ headless: "new",
+ args: ['--no-sandbox', '--disable-setuid-sandbox']
+ });
- Add a comma in the overview sentence:
-In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.
+In this example, we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.
These changes will ensure better compatibility across different environments and improve the grammatical correctness of the documentation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
## Basic example | |
### Overview | |
In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page. | |
### Task code | |
```ts trigger/puppeteer-basic-example.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer"; | |
export const puppeteerTask = task({ | |
id: "puppeteer-log-title", | |
run: async () => { | |
const browser = await puppeteer.launch(); | |
const page = await browser.newPage(); | |
await page.goto("https://trigger.dev"); | |
const content = await page.title(); | |
logger.info("Content", { content }); | |
await browser.close(); | |
}, | |
}); | |
``` | |
### Testing your task | |
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). | |
## Basic example | |
### Overview | |
In this example, we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page. | |
### Task code | |
```ts trigger/puppeteer-basic-example.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer"; | |
export const puppeteerTask = task({ | |
id: "puppeteer-log-title", | |
run: async () => { | |
const browser = await puppeteer.launch({ | |
headless: "new", | |
args: ['--no-sandbox', '--disable-setuid-sandbox'] | |
}); | |
const page = await browser.newPage(); | |
await page.goto("https://trigger.dev"); | |
const content = await page.title(); | |
logger.info("Content", { content }); | |
await browser.close(); | |
}, | |
}); | |
``` | |
### Testing your task | |
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). |
🧰 Tools
LanguageTool
[typographical] ~51-~51: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer t...(DURING_THAT_TIME_COMMA)
docs/examples/puppeteer.mdx
Outdated
## Scrape content from a web page | ||
|
||
### Overview | ||
|
||
In this example we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. See [this list](/examples/puppeteer#proxying) for more proxying services we recommend. | ||
|
||
<Warning> | ||
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension. | ||
</Warning> | ||
|
||
### Task code | ||
|
||
```ts trigger/scrape-website.ts | ||
import { logger, task } from "@trigger.dev/sdk/v3"; | ||
import puppeteer from "puppeteer-core"; | ||
|
||
export const puppeteerScrapeWithProxy = task({ | ||
id: "puppeteer-scrape-with-proxy", | ||
run: async () => { | ||
const browser = await puppeteer.connect({ | ||
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`, | ||
}); | ||
|
||
const page = await browser.newPage(); | ||
|
||
// Set up BrowserBase proxy authentication | ||
await page.authenticate({ | ||
username: "api", | ||
password: process.env.BROWSERBASE_API_KEY || "", | ||
}); | ||
|
||
try { | ||
// Navigate to the target website | ||
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" }); | ||
|
||
// Scrape the GitHub stars count | ||
const starCount = await page.evaluate(() => { | ||
const starElement = document.querySelector(".github-star-count"); | ||
const text = starElement?.textContent ?? "0"; | ||
const numberText = text.replace(/[^0-9]/g, ""); | ||
return parseInt(numberText); | ||
}); | ||
|
||
logger.info("GitHub star count", { starCount }); | ||
|
||
return { starCount }; | ||
} catch (error) { | ||
logger.error("Error during scraping", { | ||
error: error instanceof Error ? error.message : String(error), | ||
}); | ||
throw error; | ||
} finally { | ||
await browser.close(); | ||
} | ||
}, | ||
}); | ||
``` | ||
|
||
### Testing your task | ||
|
||
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve star count extraction, grammar, and approve proxy usage
The web scraping example is well-structured and emphasizes the importance of using a proxy. However, we can make some improvements:
- Enhance the star count extraction logic for better robustness.
- Fix a minor grammatical issue in the overview.
Please apply the following changes:
- Modify the star count extraction logic:
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
- const text = starElement?.textContent ?? "0";
- const numberText = text.replace(/[^0-9]/g, "");
- return parseInt(numberText);
+ const text = starElement?.textContent?.trim() ?? "0";
+ const match = text.match(/^([\d,]+)/);
+ return match ? parseInt(match[1].replace(/,/g, '')) : 0;
});
- Add a comma in the overview sentence:
-In this example we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out.
+In this example, we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out.
These changes will make the star count extraction more resilient to different formatting styles, improve the grammatical correctness of the documentation, and maintain the good practice of using a proxy for web scraping.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
## Scrape content from a web page | |
### Overview | |
In this example we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. See [this list](/examples/puppeteer#proxying) for more proxying services we recommend. | |
<Warning> | |
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension. | |
</Warning> | |
### Task code | |
```ts trigger/scrape-website.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer-core"; | |
export const puppeteerScrapeWithProxy = task({ | |
id: "puppeteer-scrape-with-proxy", | |
run: async () => { | |
const browser = await puppeteer.connect({ | |
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`, | |
}); | |
const page = await browser.newPage(); | |
// Set up BrowserBase proxy authentication | |
await page.authenticate({ | |
username: "api", | |
password: process.env.BROWSERBASE_API_KEY || "", | |
}); | |
try { | |
// Navigate to the target website | |
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" }); | |
// Scrape the GitHub stars count | |
const starCount = await page.evaluate(() => { | |
const starElement = document.querySelector(".github-star-count"); | |
const text = starElement?.textContent ?? "0"; | |
const numberText = text.replace(/[^0-9]/g, ""); | |
return parseInt(numberText); | |
}); | |
logger.info("GitHub star count", { starCount }); | |
return { starCount }; | |
} catch (error) { | |
logger.error("Error during scraping", { | |
error: error instanceof Error ? error.message : String(error), | |
}); | |
throw error; | |
} finally { | |
await browser.close(); | |
} | |
}, | |
}); | |
``` | |
### Testing your task | |
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). | |
## Scrape content from a web page | |
### Overview | |
In this example, we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. See [this list](/examples/puppeteer#proxying) for more proxying services we recommend. | |
<Warning> | |
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension. | |
</Warning> | |
### Task code | |
```ts trigger/scrape-website.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer-core"; | |
export const puppeteerScrapeWithProxy = task({ | |
id: "puppeteer-scrape-with-proxy", | |
run: async () => { | |
const browser = await puppeteer.connect({ | |
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`, | |
}); | |
const page = await browser.newPage(); | |
// Set up BrowserBase proxy authentication | |
await page.authenticate({ | |
username: "api", | |
password: process.env.BROWSERBASE_API_KEY || "", | |
}); | |
try { | |
// Navigate to the target website | |
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" }); | |
// Scrape the GitHub stars count | |
const starCount = await page.evaluate(() => { | |
const starElement = document.querySelector(".github-star-count"); | |
const text = starElement?.textContent?.trim() ?? "0"; | |
const match = text.match(/^([\d,]+)/); | |
return match ? parseInt(match[1].replace(/,/g, '')) : 0; | |
}); | |
logger.info("GitHub star count", { starCount }); | |
return { starCount }; | |
} catch (error) { | |
logger.error("Error during scraping", { | |
error: error instanceof Error ? error.message : String(error), | |
}); | |
throw error; | |
} finally { | |
await browser.close(); | |
} | |
}, | |
}); | |
``` | |
### Testing your task | |
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). |
🧰 Tools
LanguageTool
[typographical] ~146-~146: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer w...(DURING_THAT_TIME_COMMA)
## Generate a PDF from a web page | ||
|
||
### Overview | ||
|
||
In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/). | ||
|
||
### Task code | ||
|
||
```ts trigger/puppeteer-generate-pdf.ts | ||
import { logger, task } from "@trigger.dev/sdk/v3"; | ||
import puppeteer from "puppeteer"; | ||
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"; | ||
|
||
// Initialize S3 client | ||
const s3Client = new S3Client({ | ||
region: "auto", | ||
endpoint: process.env.S3_ENDPOINT, | ||
credentials: { | ||
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "", | ||
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "", | ||
}, | ||
}); | ||
|
||
export const puppeteerWebpageToPDF = task({ | ||
id: "puppeteer-webpage-to-pdf", | ||
run: async () => { | ||
const browser = await puppeteer.launch(); | ||
const page = await browser.newPage(); | ||
const response = await page.goto("https://trigger.dev"); | ||
const url = response?.url() ?? "No URL found"; | ||
|
||
// Generate PDF from the web page | ||
const generatePdf = await page.pdf(); | ||
|
||
logger.info("PDF generated from URL", { url }); | ||
|
||
await browser.close(); | ||
|
||
// Upload to R2 | ||
const s3Key = `pdfs/test.pdf`; | ||
const uploadParams = { | ||
Bucket: process.env.S3_BUCKET, | ||
Key: s3Key, | ||
Body: generatePdf, | ||
ContentType: "application/pdf", | ||
}; | ||
|
||
logger.log("Uploading to R2 with params", uploadParams); | ||
|
||
// Upload the PDF to R2 and return the URL. | ||
await s3Client.send(new PutObjectCommand(uploadParams)); | ||
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`; | ||
logger.log("PDF uploaded to R2", { url: s3Url }); | ||
return { pdfUrl: s3Url }; | ||
}, | ||
}); | ||
|
||
``` | ||
|
||
### Testing your task | ||
|
||
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance error handling, resource management, and grammar in the PDF generation example
The PDF generation example is functional, but we can improve its robustness and clarity:
- Implement better error handling and resource management.
- Fix a minor grammatical issue in the overview.
- Use environment variables more safely.
Please apply the following changes:
- Modify the
puppeteerWebpageToPDF
task:
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
+ let browser;
+ try {
- const browser = await puppeteer.launch();
+ browser = await puppeteer.launch({
+ headless: "new",
+ args: ['--no-sandbox', '--disable-setuid-sandbox']
+ });
const page = await browser.newPage();
const response = await page.goto("https://trigger.dev");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
- await browser.close();
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
- Bucket: process.env.S3_BUCKET,
+ Bucket: process.env.S3_BUCKET ?? '',
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
- const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
+ const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
+ } catch (error) {
+ logger.error("Error in puppeteerWebpageToPDF", { error });
+ throw error;
+ } finally {
+ if (browser) {
+ await browser.close();
+ }
+ }
},
});
- Add a comma in the overview sentence:
-In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).
+In this example, we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).
These changes will improve error handling, ensure proper resource cleanup, make the code more robust against potential issues with environment variables, and improve the grammatical correctness of the documentation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
## Generate a PDF from a web page | |
### Overview | |
In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/). | |
### Task code | |
```ts trigger/puppeteer-generate-pdf.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer"; | |
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"; | |
// Initialize S3 client | |
const s3Client = new S3Client({ | |
region: "auto", | |
endpoint: process.env.S3_ENDPOINT, | |
credentials: { | |
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "", | |
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "", | |
}, | |
}); | |
export const puppeteerWebpageToPDF = task({ | |
id: "puppeteer-webpage-to-pdf", | |
run: async () => { | |
const browser = await puppeteer.launch(); | |
const page = await browser.newPage(); | |
const response = await page.goto("https://trigger.dev"); | |
const url = response?.url() ?? "No URL found"; | |
// Generate PDF from the web page | |
const generatePdf = await page.pdf(); | |
logger.info("PDF generated from URL", { url }); | |
await browser.close(); | |
// Upload to R2 | |
const s3Key = `pdfs/test.pdf`; | |
const uploadParams = { | |
Bucket: process.env.S3_BUCKET, | |
Key: s3Key, | |
Body: generatePdf, | |
ContentType: "application/pdf", | |
}; | |
logger.log("Uploading to R2 with params", uploadParams); | |
// Upload the PDF to R2 and return the URL. | |
await s3Client.send(new PutObjectCommand(uploadParams)); | |
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`; | |
logger.log("PDF uploaded to R2", { url: s3Url }); | |
return { pdfUrl: s3Url }; | |
}, | |
}); | |
``` | |
### Testing your task | |
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). | |
## Generate a PDF from a web page | |
### Overview | |
In this example, we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/). | |
### Task code | |
```ts trigger/puppeteer-generate-pdf.ts | |
import { logger, task } from "@trigger.dev/sdk/v3"; | |
import puppeteer from "puppeteer"; | |
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3"; | |
// Initialize S3 client | |
const s3Client = new S3Client({ | |
region: "auto", | |
endpoint: process.env.S3_ENDPOINT, | |
credentials: { | |
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "", | |
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "", | |
}, | |
}); | |
export const puppeteerWebpageToPDF = task({ | |
id: "puppeteer-webpage-to-pdf", | |
run: async () => { | |
let browser; | |
try { | |
browser = await puppeteer.launch({ | |
headless: "new", | |
args: ['--no-sandbox', '--disable-setuid-sandbox'] | |
}); | |
const page = await browser.newPage(); | |
const response = await page.goto("https://trigger.dev"); | |
const url = response?.url() ?? "No URL found"; | |
// Generate PDF from the web page | |
const generatePdf = await page.pdf(); | |
logger.info("PDF generated from URL", { url }); | |
// Upload to R2 | |
const s3Key = `pdfs/test.pdf`; | |
const uploadParams = { | |
Bucket: process.env.S3_BUCKET ?? '', | |
Key: s3Key, | |
Body: generatePdf, | |
ContentType: "application/pdf", | |
}; | |
logger.log("Uploading to R2 with params", uploadParams); | |
// Upload the PDF to R2 and return the URL. | |
await s3Client.send(new PutObjectCommand(uploadParams)); | |
const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`; | |
logger.log("PDF uploaded to R2", { url: s3Url }); | |
return { pdfUrl: s3Url }; | |
} catch (error) { | |
logger.error("Error in puppeteerWebpageToPDF", { error }); | |
throw error; | |
} finally { | |
if (browser) { | |
await browser.close(); | |
} | |
} | |
}, | |
}); | |
``` | |
### Testing your task | |
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests). |
🧰 Tools
LanguageTool
[typographical] ~83-~83: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer t...(DURING_THAT_TIME_COMMA)
3 new Puppeteer examples:
Also added puppeteer to the build config page
Summary by CodeRabbit
New Features
Documentation