Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get respond data from a request endpoint? #520

Closed
kien-pham opened this issue Dec 27, 2022 · 10 comments
Closed

How to get respond data from a request endpoint? #520

kien-pham opened this issue Dec 27, 2022 · 10 comments

Comments

@kien-pham
Copy link

kien-pham commented Dec 27, 2022

Hello,

I've searched around the docs and other's discussions but can't found the answer. I make a request to visit a website which sends request to API endpoints. These API endpoints reply with data, I want to get this data.

This API endpoint has been protected and can't get by HTTP request with axios or normal fetching method. The only way to get it is chrome-remote-interface

Screenshot 2022-12-27 at 23 01 19

Here is my code:

async function getFetchingRequestURLs() {
  const client = await CDP();
  const { Network, Page } = client;
  await Network.enable();

  await Network.setUserAgentOverride({ userAgent: '...' });

  // Get the request URLs
  Network.requestWillBeSent(async (params) => {
    // I want to get data from params.request.url here
  });

  // Navigate to the website
  await Page.navigate({ url: "https://v.douyin.com/hgEpRHh/" });
  await Page.loadEventFired();
  await client.close();
}

Thank you alot for help.

@cyrus-and
Copy link
Owner

Something like this:

  1. take notice of the requestId in Network.requestWillBeSent for the one(s) you want to fetch;
  2. wait for the Network.loadingFinished event to fire for those;
  3. fetch the body with Network.getResponseBody.

Hope it helps!

@kien-pham
Copy link
Author

Thank you, I got it working on local now. Now I have an issue with running it on AWS Lambda. It can't open chrome browser there. How can I fix that?

@cyrus-and
Copy link
Owner

Headless mode?

@kien-pham
Copy link
Author

Headless mode?

Yes, but seems we need to use puppeteer to open page.

@cyrus-and
Copy link
Owner

Why...? Have you even tried?

@kien-pham
Copy link
Author

This is my code, can you take a look? I tested on local but doesn't work well, seems it can't open the page.

const CDP = require("chrome-remote-interface");
const chromium = require("chrome-aws-lambda");

const UA = "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X)";

// main lambda function
exports.handler = async (event, context, callback) => {
    let result = null;
    let browser = null;
    const { url } = event;

    try {
      browser = await chromium.puppeteer.launch({
        executablePath: await chromium.executablePath,
      });

      CDP({
        browserWSEndpoint: browser.wsEndpoint(),
      }).then(async (client) => {
        const { Network, Page, Emulation } = client;
        await Network.enable();
        await Network.setUserAgentOverride({ userAgent: UA });
        
        const responseBodyPromise = new Promise((resolve, reject) => {
          Network.requestWillBeSent(async (params) => {
            // check if the request match the URL I need
            if (params.request.url.includes('/abc/xyz')) {
              // make sure it is fully loaded, status = 200
              const isLoaded = new Promise((resolve, reject) => {
                Network.responseReceived(async (paramsReceived) => {
                  if (
                    params.requestId === paramsReceived.requestId &&
                    paramsReceived.response.status === 200
                  ) {
                    resolve(true);
                  }
                });
              });

              // get the data
              if (await isLoaded) {
                console.log("isLoaded");
                await new Promise((resolve) => setTimeout(resolve, 100)); // wait a moment
                const responseBody = await Network.getResponseBody({
                  requestId: params.requestId,
                });
                resolve(responseBody);
              }
            }
          });
        });

        result = await responseBodyPromise;

        // Navigate to the website
        await Page.navigate({ url });
        await Page.loadEventFired();
      });
    } catch (error) {
      return callback(error);
    } finally {
      if (browser !== null) {
        await browser.close();
      }
    }
    return callback(null, result);
  };

@cyrus-and
Copy link
Owner

You cannot load pages in the browser context (browser.wsEndpoint()), plus the browserWSEndpoint does not exist in chrome-remote-interface...

I don't think you need Puppeteer at all, just chrome-launcher:

const CDP = require('chrome-remote-interface');
const ChromeLauncher = require('chrome-launcher');

(async () => {
    const chrome = await ChromeLauncher.launch({
        chromeFlags: [
            '--headless'
        ]
    });
    const client = await CDP({
        port: chrome.port
    });

    const {Page, Network} = client;

    Network.requestWillBeSent(({request}) => {
        console.log(request.url);
    });

    await Page.navigate({
        url: 'http://github.com'
    });
    await Page.enable();
    await Network.enable();
    await Page.loadEventFired();

    await client.close();
    await chrome.kill();
})();

@kien-pham
Copy link
Author

Thanks for suggesion, I got error for chromePath as below. If it's on AWS lambda, I think we still need chrome-aws-lambda.

"errorMessage": "The CHROME_PATH environment variable must be set to a Chrome/Chromium executable no older than Chrome stable.",
  "code": "ERR_LAUNCHER_PATH_NOT_SET",

@cyrus-and
Copy link
Owner

Then use the version from chrome-aws-lambda:

const chromium = require('chrome-aws-lambda');

// ...

    const chrome = await ChromeLauncher.launch({
        chromePath: await chromium.executablePath,
        chromeFlags: [
            '--headless'
        ]
    });

Also see alixaxel/chrome-aws-lambda#86.

@kien-pham
Copy link
Author

Thank you for support, I will take a look. Hope this will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants