Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New puppeteer examples #1355

Merged
merged 6 commits into from
Sep 25, 2024
Merged

New puppeteer examples #1355

merged 6 commits into from
Sep 25, 2024

Conversation

samejr
Copy link
Member

@samejr samejr commented Sep 25, 2024

3 new Puppeteer examples:

  • A basic example
  • Generate a PDF from a web page
  • Scrape content from a web page

Also added puppeteer to the build config page

Summary by CodeRabbit

  • New Features

    • Introduced Puppeteer support for generating PDFs and web scraping in Trigger.dev projects.
    • Added a comprehensive guide and examples for using Puppeteer, including tasks for logging webpage titles, generating PDFs, and scraping content.
    • Included a new warning regarding web scraping practices to ensure compliance with terms of service.
  • Documentation

    • Expanded documentation with new examples and configuration instructions for Puppeteer.
    • Updated existing documentation to reflect changes and new warnings related to web scraping.

Copy link

changeset-bot bot commented Sep 25, 2024

⚠️ No Changeset found

Latest commit: 28c0bef

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Sep 25, 2024

Warning

Rate limit exceeded

@samejr has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 11 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Files that changed from the base of the PR and between 6a800d5 and 28c0bef.

Walkthrough

The pull request introduces comprehensive documentation updates for integrating Puppeteer within Trigger.dev projects. It adds new sections detailing the configuration of Puppeteer in the trigger.config.ts file, including necessary environment variables and usage guidelines. Additionally, it provides extensive examples for tasks such as logging webpage titles, generating PDFs, and scraping content. A warning about web scraping practices is also included to ensure compliance with terms of service.

Changes

Files Change Summary
docs/config/config-file.mdx Added instructions for using Puppeteer in trigger.config.ts, including configuration examples.
docs/examples/puppeteer.mdx Created a guide on using Puppeteer with examples for logging titles, generating PDFs, and scraping.
docs/snippets/web-scraping-warning.mdx Added a warning about the prohibition of scraping third-party websites without permission.

Possibly related PRs

Suggested reviewers

  • ericallam
  • matt-aitken

🐰 In the garden of code, we hop with glee,
Puppeteer now dances, as swift as can be!
With PDFs and scraping, our tasks are a breeze,
Just mind the warnings, and do as you please!
In Trigger.dev’s realm, our projects will shine,
Hooray for the changes, all perfectly fine! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (6)
docs/snippets/web-scraping-warning.mdx (1)

1-3: Approved: Clear and important warning message.

The warning message is well-crafted and effectively communicates the company's policy on web scraping. It aligns with the PR objectives and provides crucial information for users.

To enhance readability, consider breaking the content into bullet points:

 <Warning>
-  **WEB SCRAPING WARNING:** Direct scraping of third-party websites without explicit permission using Trigger.dev Cloud is strictly prohibited and will result in immediate account suspension. If web scraping is necessary for your project, you MUST use a proxy service to comply with our terms of service.
+  **WEB SCRAPING WARNING:**
+  - Direct scraping of third-party websites without explicit permission using Trigger.dev Cloud is strictly prohibited.
+  - Violation will result in immediate account suspension.
+  - If web scraping is necessary for your project, you MUST use a proxy service to comply with our terms of service.
 </Warning>

This format makes the key points more scannable and easier to digest.

docs/examples/puppeteer.mdx (4)

20-35: Add import statement for puppeteer() function

The code snippet uses the puppeteer() function in the build configuration, but it's not imported. To improve clarity and prevent potential errors, consider adding the import statement at the beginning of the file.

Add the following import statement at the beginning of the trigger.config.ts file:

 import { defineConfig } from "@trigger.dev/sdk/v3";
+import { puppeteer } from "@trigger.dev/sdk/v3";

37-43: Add a note about system-dependent paths

The PUPPETEER_EXECUTABLE_PATH environment variable is correctly set, but the provided path is specific to a Linux environment. To improve clarity for users on different operating systems, consider adding a note about system-dependent paths.

Add the following note after the environment variable example:

Note: The path to the Chrome executable may vary depending on your operating system and installation method. Adjust the path accordingly for your specific environment.

202-213: LGTM: Comprehensive proxying information

The proxying section provides crucial information about the necessity of using proxies for web scraping and offers a helpful list of recommended services.

Consider adding a brief explanation of why each proxy service is recommended or what specific features they offer. This could help users make a more informed decision when choosing a proxy service.

🧰 Tools
LanguageTool

[uncategorized] ~204-~204: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...

(COMMA_COMPOUND_SENTENCE)


[grammar] ~206-~206: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...

(THERE_IS_ARE)


206-206: Fix verb agreement

There's a minor grammatical issue in the introduction to the list of proxy services.

Change the sentence to:

- Here are a list of proxy services we recommend:
+ Here is a list of proxy services we recommend:

This correction ensures proper verb agreement with the singular noun "list".

🧰 Tools
LanguageTool

[grammar] ~206-~206: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...

(THERE_IS_ARE)

docs/config/config-file.mdx (1)

477-503: LGTM! Clear and comprehensive Puppeteer configuration guide.

The new Puppeteer configuration section is well-structured and provides clear instructions. It's great that you've included:

  1. A warning about web scraping practices.
  2. A code example for the trigger.config.ts file.
  3. Instructions for setting up the necessary environment variable.
  4. A note about using puppeteer instead of puppeteer-core.

Consider adding a brief explanation of why the PUPPETEER_EXECUTABLE_PATH environment variable is needed. This could help users understand the purpose of this configuration step.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 79624c1 and c66f98d.

📒 Files selected for processing (5)
  • docs/config/config-file.mdx (3 hunks)
  • docs/examples/intro.mdx (1 hunks)
  • docs/examples/puppeteer.mdx (1 hunks)
  • docs/mint.json (1 hunks)
  • docs/snippets/web-scraping-warning.mdx (1 hunks)
🧰 Additional context used
LanguageTool
docs/examples/puppeteer.mdx

[typographical] ~49-~49: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer to log out the title o...

(DURING_THAT_TIME_COMMA)


[typographical] ~81-~81: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer to generate a PDF from...

(DURING_THAT_TIME_COMMA)


[typographical] ~144-~144: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer with a BrowserBase pro...

(DURING_THAT_TIME_COMMA)


[uncategorized] ~204-~204: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...

(COMMA_COMPOUND_SENTENCE)


[grammar] ~206-~206: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...

(THERE_IS_ARE)

🔇 Additional comments not posted (7)
docs/examples/intro.mdx (1)

14-14: New Puppeteer example added correctly.

The new entry for Puppeteer has been added to the table with proper formatting and positioning. The description is concise and accurately represents the functionality of Puppeteer. The link follows the same pattern as other entries, maintaining consistency.

However, to ensure completeness:

Let's verify if the linked page exists:

This will help confirm that the linked page has been created and is in the correct location.

✅ Verification successful

Puppeteer example link verified successfully.

The linked page docs/examples/puppeteer.mdx exists, ensuring that the Puppeteer entry in the table points to a valid resource.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the Puppeteer example page exists
fd -t f "puppeteer.mdx" docs/examples

Length of output: 65

docs/examples/puppeteer.mdx (2)

1-19: LGTM: Well-structured introduction and overview

The file header and overview section provide clear and concise information about the document's content. The inclusion of a web scraping warning is a responsible practice.


1-213: Overall: Excellent documentation with minor improvements suggested

This document provides comprehensive and well-structured guidance on using Puppeteer with Trigger.dev. The examples cover a range of use cases and include important information about build configurations, environment setup, and proxying.

The suggested improvements primarily focus on:

  1. Enhancing code robustness and error handling.
  2. Improving compatibility across different environments.
  3. Refining content extraction logic.
  4. Minor grammatical and clarity enhancements.

These changes will further elevate the quality of this already valuable resource for developers integrating Puppeteer with Trigger.dev.

🧰 Tools
LanguageTool

[typographical] ~49-~49: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer to log out the title o...

(DURING_THAT_TIME_COMMA)


[typographical] ~81-~81: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer to generate a PDF from...

(DURING_THAT_TIME_COMMA)


[typographical] ~144-~144: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer with a BrowserBase pro...

(DURING_THAT_TIME_COMMA)


[uncategorized] ~204-~204: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...

(COMMA_COMPOUND_SENTENCE)


[grammar] ~206-~206: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...

(THERE_IS_ARE)

docs/mint.json (1)

285-285: LGTM: New Puppeteer example correctly added

The addition of "examples/puppeteer" to the list of examples in the navigation section is correct and aligns with the PR objectives. It's appropriately placed among other technology-specific examples.

A few points to note:

  1. The new entry is correctly formatted and consistent with other entries in the list.
  2. Its position in the list seems logical, grouped with other technology-specific examples.
  3. This change supports the PR's goal of introducing new Puppeteer examples to the documentation.

To ensure the new example page exists, please run the following command:

If the command doesn't return a result, please ensure that the corresponding Puppeteer example page has been created in the docs/examples directory.

✅ Verification successful

Verified: Puppeteer example page exists

The Puppeteer example page docs/examples/puppeteer.mdx has been successfully added and exists in the documentation examples directory.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the existence of the Puppeteer example page
fd -t f "puppeteer.mdx" docs/examples

Length of output: 65

docs/config/config-file.mdx (3)

7-7: New import for ScrapingWarning component.

The import of the ScrapingWarning component is correctly added to support the new Puppeteer section.


514-514: Minor update to ffmpeg section comment.

The comment in the ffmpeg section has been updated to be more generic, which is a good improvement for maintainability.


Line range hint 1-646: Overall, excellent updates to the configuration documentation.

The addition of the Puppeteer section and the minor updates to other parts of the document improve the overall quality and completeness of the configuration guide. The changes are well-integrated and provide clear, actionable information for users.

Comment on lines 45 to 71
## Basic example

### Overview

In this example we use Puppeteer to log out the title of a web page, in this case Google.

### Task code

```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";

export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();

await page.goto("https://google.com");

const content = await page.title();
logger.info("Content", { content });

await browser.close();
},
});
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding launch options for better compatibility

The basic example demonstrates Puppeteer usage correctly. However, to ensure better compatibility across different environments, consider adding launch options to the puppeteer.launch() call.

Modify the puppeteer.launch() call as follows:

-    const browser = await puppeteer.launch();
+    const browser = await puppeteer.launch({
+      headless: "new",
+      args: ['--no-sandbox', '--disable-setuid-sandbox']
+    });

This change ensures compatibility with newer versions of Puppeteer and improves stability in various environments, including containerized ones.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Basic example
### Overview
In this example we use Puppeteer to log out the title of a web page, in this case Google.
### Task code
```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://google.com");
const content = await page.title();
logger.info("Content", { content });
await browser.close();
},
});
```
## Basic example
### Overview
In this example we use Puppeteer to log out the title of a web page, in this case Google.
### Task code
```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch({
headless: "new",
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto("https://google.com");
const content = await page.title();
logger.info("Content", { content });
await browser.close();
},
});
```
🧰 Tools
LanguageTool

[typographical] ~49-~49: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer to log out the title o...

(DURING_THAT_TIME_COMMA)

Comment on lines 140 to 194
## Scrape content from a web page

### Overview

In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out.

<ScrapingWarning />

### Task code

```ts trigger/scrape-website.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";

export const puppeteerScrapeWithProxy = task({
id: "puppeteer-scrape-with-proxy",
run: async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
});

const page = await browser.newPage();

// Set up BrowserBase proxy authentication
await page.authenticate({
username: "api",
password: process.env.BROWSERBASE_API_KEY || "",
});

try {
// Navigate to the target website
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });

// Scrape the GitHub stars count
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
const text = starElement?.textContent ?? "0";
const numberText = text.replace(/[^0-9]/g, "");
return parseInt(numberText);
});

logger.info("GitHub star count", { starCount });

return { starCount };
} catch (error) {
logger.error("Error during scraping", {
error: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
await browser.close();
}
},
});
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve star count extraction logic

The web scraping example is well-structured with good error handling. However, the star count extraction logic could be more robust.

Consider modifying the star count extraction logic to handle potential formatting variations:

       const starCount = await page.evaluate(() => {
         const starElement = document.querySelector(".github-star-count");
-        const text = starElement?.textContent ?? "0";
-        const numberText = text.replace(/[^0-9]/g, "");
-        return parseInt(numberText);
+        const text = starElement?.textContent?.trim() ?? "0";
+        const match = text.match(/^([\d,]+)/);
+        return match ? parseInt(match[1].replace(/,/g, '')) : 0;
       });

This change improves the extraction logic by:

  1. Trimming whitespace from the text content.
  2. Using a regex to match the first group of digits (including commas).
  3. Removing commas before parsing the integer.
  4. Returning 0 if no match is found.

These modifications make the extraction more resilient to different formatting styles of the star count.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Scrape content from a web page
### Overview
In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out.
<ScrapingWarning />
### Task code
```ts trigger/scrape-website.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";
export const puppeteerScrapeWithProxy = task({
id: "puppeteer-scrape-with-proxy",
run: async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
});
const page = await browser.newPage();
// Set up BrowserBase proxy authentication
await page.authenticate({
username: "api",
password: process.env.BROWSERBASE_API_KEY || "",
});
try {
// Navigate to the target website
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });
// Scrape the GitHub stars count
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
const text = starElement?.textContent ?? "0";
const numberText = text.replace(/[^0-9]/g, "");
return parseInt(numberText);
});
logger.info("GitHub star count", { starCount });
return { starCount };
} catch (error) {
logger.error("Error during scraping", {
error: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
await browser.close();
}
},
});
```
## Scrape content from a web page
### Overview
In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out.
<ScrapingWarning />
### Task code
```ts trigger/scrape-website.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";
export const puppeteerScrapeWithProxy = task({
id: "puppeteer-scrape-with-proxy",
run: async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
});
const page = await browser.newPage();
// Set up BrowserBase proxy authentication
await page.authenticate({
username: "api",
password: process.env.BROWSERBASE_API_KEY || "",
});
try {
// Navigate to the target website
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });
// Scrape the GitHub stars count
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
const text = starElement?.textContent?.trim() ?? "0";
const match = text.match(/^([\d,]+)/);
return match ? parseInt(match[1].replace(/,/g, '')) : 0;
});
logger.info("GitHub star count", { starCount });
return { starCount };
} catch (error) {
logger.error("Error during scraping", {
error: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
await browser.close();
}
},
});
```
🧰 Tools
LanguageTool

[typographical] ~144-~144: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer with a BrowserBase pro...

(DURING_THAT_TIME_COMMA)

Comment on lines 77 to 133
## Generate a PDF from a web page

### Overview

In this example we use Puppeteer to generate a PDF from a web page and upload it to Cloudflare R2.

### Task code

```ts trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";

// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});

export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const response = await page.goto("https://google.com");
const url = response?.url() ?? "No URL found";

// Generate PDF from the web page
const generatePdf = await page.pdf();

logger.info("PDF generated from URL", { url });

await browser.close();

// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET,
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};

logger.log("Uploading to R2 with params", uploadParams);

// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
},
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and resource management

The PDF generation example is well-structured but could benefit from improved error handling and resource management.

Consider the following improvements:

  1. Wrap the main logic in a try-catch block to handle potential errors.
  2. Ensure the browser is closed even if an error occurs.
  3. Use environment variables more safely with fallback values.

Here's a suggested refactor:

 export const puppeteerWebpageToPDF = task({
   id: "puppeteer-webpage-to-pdf",
   run: async () => {
+    let browser;
+    try {
-      const browser = await puppeteer.launch();
+      browser = await puppeteer.launch({
+        headless: "new",
+        args: ['--no-sandbox', '--disable-setuid-sandbox']
+      });
       const page = await browser.newPage();
       const response = await page.goto("https://google.com");
       const url = response?.url() ?? "No URL found";

       // Generate PDF from the web page
       const generatePdf = await page.pdf();

       logger.info("PDF generated from URL", { url });

-      await browser.close();

       // Upload to R2
       const s3Key = `pdfs/test.pdf`;
       const uploadParams = {
-        Bucket: process.env.S3_BUCKET,
+        Bucket: process.env.S3_BUCKET ?? '',
         Key: s3Key,
         Body: generatePdf,
         ContentType: "application/pdf",
       };

       logger.log("Uploading to R2 with params", uploadParams);

       // Upload the PDF to R2 and return the URL.
       await s3Client.send(new PutObjectCommand(uploadParams));
-      const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
+      const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`;
       logger.log("PDF uploaded to R2", { url: s3Url });
       return { pdfUrl: s3Url };
+    } catch (error) {
+      logger.error("Error in puppeteerWebpageToPDF", { error });
+      throw error;
+    } finally {
+      if (browser) {
+        await browser.close();
+      }
+    }
   },
 });

These changes improve error handling, ensure proper resource cleanup, and make the code more robust against potential issues with environment variables.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Generate a PDF from a web page
### Overview
In this example we use Puppeteer to generate a PDF from a web page and upload it to Cloudflare R2.
### Task code
```ts trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";
// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const response = await page.goto("https://google.com");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
await browser.close();
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET,
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
},
});
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
let browser;
try {
browser = await puppeteer.launch({
headless: "new",
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
const response = await page.goto("https://google.com");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET ?? '',
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
} catch (error) {
logger.error("Error in puppeteerWebpageToPDF", { error });
throw error;
} finally {
if (browser) {
await browser.close();
}
}
},
});
🧰 Tools
LanguageTool

[typographical] ~81-~81: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer to generate a PDF from...

(DURING_THAT_TIME_COMMA)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
docs/examples/puppeteer.mdx (2)

1-46: LGTM! Consider adding a brief explanation for the environment variable.

The introduction and build configuration sections are well-structured and provide clear instructions. However, it might be helpful to briefly explain why the PUPPETEER_EXECUTABLE_PATH environment variable is needed.

Consider adding a short explanation after line 45, such as:

This environment variable tells Puppeteer where to find the Chrome executable, which is necessary for it to function correctly in the Trigger.dev environment.

206-217: LGTM! Consider minor grammatical improvements

The proxying section effectively emphasizes the importance of using proxies for web scraping and provides a helpful list of recommended services. This is crucial information for users to comply with terms of service.

Consider the following minor grammatical improvements:

  1. Adjust the verb agreement in the introduction to the list:
-Here are a list of proxy services we recommend:
+Here is a list of proxy services we recommend:
  1. Add a comma before "and" in the warning sentence:
-If you don't you'll risk getting our IP address blocked and we will ban you from our service.
+If you don't, you'll risk getting our IP address blocked, and we will ban you from our service.

These small changes will enhance the readability and grammatical correctness of the documentation.

🧰 Tools
LanguageTool

[uncategorized] ~208-~208: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...

(COMMA_COMPOUND_SENTENCE)


[grammar] ~210-~210: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...

(THERE_IS_ARE)

docs/config/config-file.mdx (1)

477-501: LGTM! Clear and concise Puppeteer integration instructions.

The new section on Puppeteer integration is well-structured and provides essential information:

  1. Includes an important warning about web scraping ethics.
  2. Clearly explains how to add Puppeteer to the build configuration.
  3. Specifies the necessary environment variable.
  4. Refers to an example for further guidance.

Consider adding a brief explanation of why the PUPPETEER_EXECUTABLE_PATH environment variable is needed. This could help users understand its importance. For example:

 PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable",
+# This path ensures Puppeteer can locate the Chrome browser in the deployment environment.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c66f98d and 6a800d5.

📒 Files selected for processing (3)
  • docs/config/config-file.mdx (4 hunks)
  • docs/examples/puppeteer.mdx (1 hunks)
  • docs/snippets/web-scraping-warning.mdx (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/snippets/web-scraping-warning.mdx
🧰 Additional context used
LanguageTool
docs/examples/puppeteer.mdx

[typographical] ~51-~51: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer t...

(DURING_THAT_TIME_COMMA)


[typographical] ~83-~83: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer t...

(DURING_THAT_TIME_COMMA)


[typographical] ~146-~146: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer w...

(DURING_THAT_TIME_COMMA)


[uncategorized] ~208-~208: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...u'll risk getting our IP address blocked and we will ban you from our service.** He...

(COMMA_COMPOUND_SENTENCE)


[grammar] ~210-~210: There may be a verb agreement error, if referring to a singular entity (a list).
Context: ... will ban you from our service.** Here are a list of proxy services we recommend: ...

(THERE_IS_ARE)

🔇 Additional comments not posted (2)
docs/config/config-file.mdx (2)

535-536: Good addition of FFmpeg example reference.

The inclusion of a link to the FFmpeg example is consistent with the documentation style and provides valuable resources for users.


Line range hint 1-651: Overall, excellent documentation updates.

The changes to this file effectively integrate Puppeteer configuration instructions and enhance the FFmpeg section. The additions are well-structured, consistent with the existing documentation style, and provide valuable information for users.

Comment on lines +47 to +77
## Basic example

### Overview

In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.

### Task code

```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";

export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();

await page.goto("https://trigger.dev");

const content = await page.title();
logger.info("Content", { content });

await browser.close();
},
});
```

### Testing your task

There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Improve compatibility and grammar in the basic example

The basic example is clear and functional. However, there are two improvements we can make:

  1. Add launch options for better compatibility across different environments.
  2. Fix a minor grammatical issue in the overview.

Please apply the following changes:

  1. Modify the puppeteer.launch() call in the code snippet:
-    const browser = await puppeteer.launch();
+    const browser = await puppeteer.launch({
+      headless: "new",
+      args: ['--no-sandbox', '--disable-setuid-sandbox']
+    });
  1. Add a comma in the overview sentence:
-In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.
+In this example, we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.

These changes will ensure better compatibility across different environments and improve the grammatical correctness of the documentation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Basic example
### Overview
In this example we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.
### Task code
```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://trigger.dev");
const content = await page.title();
logger.info("Content", { content });
await browser.close();
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
## Basic example
### Overview
In this example, we use [Puppeteer](https://pptr.dev/) to log out the title of a web page, in this case from the [Trigger.dev](https://trigger.dev) landing page.
### Task code
```ts trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
export const puppeteerTask = task({
id: "puppeteer-log-title",
run: async () => {
const browser = await puppeteer.launch({
headless: "new",
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto("https://trigger.dev");
const content = await page.title();
logger.info("Content", { content });
await browser.close();
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
🧰 Tools
LanguageTool

[typographical] ~51-~51: It appears that a comma is missing.
Context: ...## Basic example ### Overview In this example we use Puppeteer t...

(DURING_THAT_TIME_COMMA)

Comment on lines 142 to 202
## Scrape content from a web page

### Overview

In this example we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. See [this list](/examples/puppeteer#proxying) for more proxying services we recommend.

<Warning>
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension.
</Warning>

### Task code

```ts trigger/scrape-website.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";

export const puppeteerScrapeWithProxy = task({
id: "puppeteer-scrape-with-proxy",
run: async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
});

const page = await browser.newPage();

// Set up BrowserBase proxy authentication
await page.authenticate({
username: "api",
password: process.env.BROWSERBASE_API_KEY || "",
});

try {
// Navigate to the target website
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });

// Scrape the GitHub stars count
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
const text = starElement?.textContent ?? "0";
const numberText = text.replace(/[^0-9]/g, "");
return parseInt(numberText);
});

logger.info("GitHub star count", { starCount });

return { starCount };
} catch (error) {
logger.error("Error during scraping", {
error: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
await browser.close();
}
},
});
```

### Testing your task

There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve star count extraction, grammar, and approve proxy usage

The web scraping example is well-structured and emphasizes the importance of using a proxy. However, we can make some improvements:

  1. Enhance the star count extraction logic for better robustness.
  2. Fix a minor grammatical issue in the overview.

Please apply the following changes:

  1. Modify the star count extraction logic:
       const starCount = await page.evaluate(() => {
         const starElement = document.querySelector(".github-star-count");
-        const text = starElement?.textContent ?? "0";
-        const numberText = text.replace(/[^0-9]/g, "");
-        return parseInt(numberText);
+        const text = starElement?.textContent?.trim() ?? "0";
+        const match = text.match(/^([\d,]+)/);
+        return match ? parseInt(match[1].replace(/,/g, '')) : 0;
       });
  1. Add a comma in the overview sentence:
-In this example we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out.
+In this example, we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out.

These changes will make the star count extraction more resilient to different formatting styles, improve the grammatical correctness of the documentation, and maintain the good practice of using a proxy for web scraping.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Scrape content from a web page
### Overview
In this example we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. See [this list](/examples/puppeteer#proxying) for more proxying services we recommend.
<Warning>
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension.
</Warning>
### Task code
```ts trigger/scrape-website.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";
export const puppeteerScrapeWithProxy = task({
id: "puppeteer-scrape-with-proxy",
run: async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
});
const page = await browser.newPage();
// Set up BrowserBase proxy authentication
await page.authenticate({
username: "api",
password: process.env.BROWSERBASE_API_KEY || "",
});
try {
// Navigate to the target website
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });
// Scrape the GitHub stars count
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
const text = starElement?.textContent ?? "0";
const numberText = text.replace(/[^0-9]/g, "");
return parseInt(numberText);
});
logger.info("GitHub star count", { starCount });
return { starCount };
} catch (error) {
logger.error("Error during scraping", {
error: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
await browser.close();
}
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
## Scrape content from a web page
### Overview
In this example, we use [Puppeteer](https://pptr.dev/) with a [BrowserBase](https://www.browserbase.com/) proxy to scrape the GitHub stars count from the [Trigger.dev](https://trigger.dev) landing page and log it out. See [this list](/examples/puppeteer#proxying) for more proxying services we recommend.
<Warning>
**WEB SCRAPING:** When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner's permission using Trigger.dev Cloud is prohibited and will result in account suspension.
</Warning>
### Task code
```ts trigger/scrape-website.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";
export const puppeteerScrapeWithProxy = task({
id: "puppeteer-scrape-with-proxy",
run: async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
});
const page = await browser.newPage();
// Set up BrowserBase proxy authentication
await page.authenticate({
username: "api",
password: process.env.BROWSERBASE_API_KEY || "",
});
try {
// Navigate to the target website
await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });
// Scrape the GitHub stars count
const starCount = await page.evaluate(() => {
const starElement = document.querySelector(".github-star-count");
const text = starElement?.textContent?.trim() ?? "0";
const match = text.match(/^([\d,]+)/);
return match ? parseInt(match[1].replace(/,/g, '')) : 0;
});
logger.info("GitHub star count", { starCount });
return { starCount };
} catch (error) {
logger.error("Error during scraping", {
error: error instanceof Error ? error.message : String(error),
});
throw error;
} finally {
await browser.close();
}
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
🧰 Tools
LanguageTool

[typographical] ~146-~146: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer w...

(DURING_THAT_TIME_COMMA)

Comment on lines +79 to +140
## Generate a PDF from a web page

### Overview

In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).

### Task code

```ts trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";

// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});

export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const response = await page.goto("https://trigger.dev");
const url = response?.url() ?? "No URL found";

// Generate PDF from the web page
const generatePdf = await page.pdf();

logger.info("PDF generated from URL", { url });

await browser.close();

// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET,
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};

logger.log("Uploading to R2 with params", uploadParams);

// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
},
});

```

### Testing your task

There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Enhance error handling, resource management, and grammar in the PDF generation example

The PDF generation example is functional, but we can improve its robustness and clarity:

  1. Implement better error handling and resource management.
  2. Fix a minor grammatical issue in the overview.
  3. Use environment variables more safely.

Please apply the following changes:

  1. Modify the puppeteerWebpageToPDF task:
 export const puppeteerWebpageToPDF = task({
   id: "puppeteer-webpage-to-pdf",
   run: async () => {
+    let browser;
+    try {
-      const browser = await puppeteer.launch();
+      browser = await puppeteer.launch({
+        headless: "new",
+        args: ['--no-sandbox', '--disable-setuid-sandbox']
+      });
       const page = await browser.newPage();
       const response = await page.goto("https://trigger.dev");
       const url = response?.url() ?? "No URL found";

       // Generate PDF from the web page
       const generatePdf = await page.pdf();

       logger.info("PDF generated from URL", { url });

-      await browser.close();

       // Upload to R2
       const s3Key = `pdfs/test.pdf`;
       const uploadParams = {
-        Bucket: process.env.S3_BUCKET,
+        Bucket: process.env.S3_BUCKET ?? '',
         Key: s3Key,
         Body: generatePdf,
         ContentType: "application/pdf",
       };

       logger.log("Uploading to R2 with params", uploadParams);

       // Upload the PDF to R2 and return the URL.
       await s3Client.send(new PutObjectCommand(uploadParams));
-      const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
+      const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`;
       logger.log("PDF uploaded to R2", { url: s3Url });
       return { pdfUrl: s3Url };
+    } catch (error) {
+      logger.error("Error in puppeteerWebpageToPDF", { error });
+      throw error;
+    } finally {
+      if (browser) {
+        await browser.close();
+      }
+    }
   },
 });
  1. Add a comma in the overview sentence:
-In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).
+In this example, we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).

These changes will improve error handling, ensure proper resource cleanup, make the code more robust against potential issues with environment variables, and improve the grammatical correctness of the documentation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Generate a PDF from a web page
### Overview
In this example we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).
### Task code
```ts trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";
// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const response = await page.goto("https://trigger.dev");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
await browser.close();
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET,
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
## Generate a PDF from a web page
### Overview
In this example, we use [Puppeteer](https://pptr.dev/) to generate a PDF from the [Trigger.dev](https://trigger.dev) landing page and upload it to [Cloudflare R2](https://developers.cloudflare.com/r2/).
### Task code
```ts trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";
// Initialize S3 client
const s3Client = new S3Client({
region: "auto",
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
},
});
export const puppeteerWebpageToPDF = task({
id: "puppeteer-webpage-to-pdf",
run: async () => {
let browser;
try {
browser = await puppeteer.launch({
headless: "new",
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
const response = await page.goto("https://trigger.dev");
const url = response?.url() ?? "No URL found";
// Generate PDF from the web page
const generatePdf = await page.pdf();
logger.info("PDF generated from URL", { url });
// Upload to R2
const s3Key = `pdfs/test.pdf`;
const uploadParams = {
Bucket: process.env.S3_BUCKET ?? '',
Key: s3Key,
Body: generatePdf,
ContentType: "application/pdf",
};
logger.log("Uploading to R2 with params", uploadParams);
// Upload the PDF to R2 and return the URL.
await s3Client.send(new PutObjectCommand(uploadParams));
const s3Url = `https://${process.env.S3_BUCKET ?? ''}.s3.amazonaws.com/${s3Key}`;
logger.log("PDF uploaded to R2", { url: s3Url });
return { pdfUrl: s3Url };
} catch (error) {
logger.error("Error in puppeteerWebpageToPDF", { error });
throw error;
} finally {
if (browser) {
await browser.close();
}
}
},
});
```
### Testing your task
There's no payload required for this task so you can just click "Run test" from the Test page in the dashboard. Learn more about testing tasks [here](/run-tests).
🧰 Tools
LanguageTool

[typographical] ~83-~83: It appears that a comma is missing.
Context: ... from a web page ### Overview In this example we use Puppeteer t...

(DURING_THAT_TIME_COMMA)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants