Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Enabling new jdk-http-client leads to 'java.io.IOException: Too many open files' and ' Unable to find a free port' #11826

Closed
EuclidHeron opened this issue Mar 28, 2023 · 9 comments

Comments

@EuclidHeron
Copy link

EuclidHeron commented Mar 28, 2023

What happened?

When enabling the new jdk-http-client client leads to leaking resources and eventually resulting in 'java.io.IOException: Too many open files' and ' Unable to find a free port' after while. This does not occur if I disable 'System.setProperty("webdriver.http.factory", "jdk-http-client");', the legacy client does not exhibit this behavior (we have run it for many many hours without seeing this once). Therefore, I think the new driver has a resource leak of some type - both related to open files and due to ports.

How can we reproduce the issue?

The easiest way to see this is to run selenium (see below for the details fo how I am running it) and enable the new client in a docker container (see below). Then go and run selenium many times. Use the command 'docker container stats' and monitor the PIDS from that command. Even when including the --init flag in docker run (do 'docker run --init ...') the PIDS grows without limit. This caused by the resource leak, leading to files not being closes (which is why we get 'java.io.IOException: Too many open files' ) and, since selenium tried to get a free port ' Unable to find a free port' due to the ports being exhausted.

I am running in Spring Boot Application in Java 17 running via Docker. I see this after about an hour of running continuously. See below for details:
1) Enabling the client
Add System.setProperty("webdriver.http.factory", "jdk-http-client"); to the main spring boot application. Also add the following to pom.xml:

<properties>
		<java.version>17</java.version>
		<selenium.constructs.version>4.8.3</selenium.constructs.version>
	</properties>
<dependency>
		<groupId>org.seleniumhq.selenium</groupId>
		<artifactId>selenium-http-jdk-client</artifactId>
		<version>${selenium.constructs.version}</version>
</dependency>
  1. Client Code:
private List<String> getWebpageInternal(String url) {
        WebDriver localWebDriver = null;
        try {
            Stopwatch stopWatch = Stopwatch.createUnstarted();
            stopWatch.start();
            localWebDriver = getWebDriver(useFastStrategy);
            localWebDriver.get("about:blank");


            try {

                localWebDriver.manage().timeouts().pageLoadTimeout(Duration.ofMillis(waitMs));
                localWebDriver.manage().timeouts().scriptTimeout(Duration.ofMillis(waitMs));
                localWebDriver.manage().timeouts().implicitlyWait(Duration.ofMillis(waitMs));
                getUrl(localWebDriver, url);
            } catch (Exception e) {
                // Still try to get the page source - some of the page might have loaded
            }

            // Note that waiting can take a long time, so for speed we currently put it before the call to stopwatch.stop();
            stopWatch.stop();
            long lengthWaited = stopWatch.elapsed(TimeUnit.MILLISECONDS);

            if (lengthWaited < minWaitMs) {
                Thread.sleep(minWaitMs - lengthWaited);
            }
            String pageSource = getPageSourceSafely(localWebDriver);
            if (pageSource == null) {
                throw new WebpageScraperTimeoutException("Was not able to get page source=" + url);
            }

            List<String> pageSources = new ArrayList<>();
            pageSources.add(pageSource);
            localWebDriver.quit();
            return pageSources;
        } catch (Exception e) {
            LOGGER.error("Got error in MakeNewWebpageScraperVersion3", e);
            quitSafely(localWebDriver);
            throw new RuntimeException(e);
        }
    }

private String getPageSourceSafely(WebDriver localWebDriver) {
        try {
            return localWebDriver.getPageSource();
        } catch (Exception e) {
            return null;
        }
    }

private void quitSafely(WebDriver localWebDriver) {
        try {
            if (localWebDriver == null) {
                return;
            }
            localWebDriver.quit();
        } catch (Exception e) {
                // nothing to do
            LOGGER.info("Got error in quit() call of quitSafely", e);
        }
    }

private WebDriver getWebDriver(boolean useFastStrategy) {
        ChromeDriverService service = new ChromeDriverService.Builder()
                .withVerbose(false)
                .withSilent(true)
                .build();
        return new ChromeDriver(service, getChromeOptions(useFastStrategy));
    }

private ChromeOptions getChromeOptions(boolean useFastStrategy) {
        ChromeOptions chromeOptions = new ChromeOptions();
        if (useFastStrategy) {
            chromeOptions.setPageLoadStrategy(PageLoadStrategy.EAGER); // was default (Normal) before.
        }

        // User agent is required because some websites will reject your request if it does not have a user agent
        chromeOptions.addArguments(String.format("user-agent=%s", USER_AGENT));
        chromeOptions.addArguments("--log-level=OFF");
        chromeOptions.addArguments("--headless=new");
        List<String> arguments = new LinkedList<>();
        arguments.add("--disable-extensions");
        arguments.add("--headless");
        arguments.add("--disable-gpu");
        arguments.add("--no-sandbox");
        arguments.add("--incognito");
        arguments.add("--disable-application-cache");
        arguments.add("--disable-dev-shm-usage");

        chromeOptions.addArguments(arguments);
        return chromeOptions;
    }
  1. Docker Code:
ARG CHROME_VERSION=111.0.5563.64-1
ADD google-chrome.repo /etc/yum.repos.d/google-chrome.repo
RUN microdnf install -y google-chrome-stable-$CHROME_VERSION \
	&& sed -i 's/"$HERE\/chrome"/"$HERE\/chrome" --no-sandbox/g' /opt/google/chrome/google-chrome

## ChromeDriver

ARG CHROME_DRIVER_VERSION=111.0.5563.64
RUN microdnf install -y unzip \
	&& curl -s -o /tmp/chromedriver.zip https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip \
	&& unzip /tmp/chromedriver.zip -d /opt \
	&& rm /tmp/chromedriver.zip \
	&& mv /opt/chromedriver /opt/chromedriver-$CHROME_DRIVER_VERSION \
	&& chmod 755 /opt/chromedriver-$CHROME_DRIVER_VERSION \
	&& ln -s /opt/chromedriver-$CHROME_DRIVER_VERSION /usr/bin/chromedriver

ENV CHROMEDRIVER_PORT 4444
ENV CHROMEDRIVER_WHITELISTED_IPS "127.0.0.1"
ENV CHROMEDRIVER_URL_BASE ''
EXPOSE 4444

EXPOSE 8080
EXPOSE 5005
ARG JAR_FILE=target/*.jar
COPY ${JAR_FILE} app.jar
# For Testing
ENTRYPOINT ["java","-jar", "-Xmx600m","/app.jar"]


### Relevant log output

```shell
For 'Unable to Find a Free Port':

java.lang.RuntimeException: Unable to find a free port
3/27/2023, 6:11:47 PM	at org.openqa.selenium.net.PortProber.findFreePort(PortProber.java:62)
3/27/2023, 6:11:47 PM	at org.openqa.selenium.remote.service.DriverService$Builder.build(DriverService.java:452)
3/27/2023, 6:11:47 PM	at monolith.scraping.MakeNewWebpageScraperVersion3.getWebDriver(MakeNewWebpageScraperVersion3.java:194)


### Operating System

Catalina and Ubuntu 22.10

### Selenium version

Java 17

### What are the browser(s) and version(s) where you see this issue?

Chrome

### What are the browser driver(s) and version(s) where you see this issue?

Chromedriver

### Are you using Selenium Grid?

No
@github-actions
Copy link

@EuclidHeron, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@bykes
Copy link

bykes commented May 10, 2023

Problem still occurs on Selenium 4.9.0 and 4.9.1
After a few hours of running selenium it start to rise
Caused by: java.net.SocketException: Too many open files

@EuclidHeron
Copy link
Author

Hi @titusfortner , I see you added the [C-java] label to this, is that because it is due to the underlying implementation of a C-library? Or something else?

@titusfortner
Copy link
Member

All the language tags are prepended with "C-" so they show up in the same place in the drop down. So, it's just a Java tag, and I have no idea why we picked the letter C.

@EuclidHeron
Copy link
Author

@titusfortner thank you for your fast response! Quick follow up - is there a way to keep this on the bug radar? Just so that it does not fall through the cracks. As @bykes shows above this issue still exists as of 4.9.1, hence my ask. Is there any other information/way that we can assist?

@cguess
Copy link

cguess commented Jun 14, 2023

I'm getting this bug in 4.10.0 as well still. Any chance of an update or workaround for this? I need my scrapers to run for weeks unfortunately and it seems to last only ~12 hours.

@EuclidHeron
Copy link
Author

@titusfortner is there a way to create a bug bounty (monetary or otherwise) for this? Do we have a procedure for that?

@EuclidHeron
Copy link
Author

Thank you so much @diemol !

Copy link

github-actions bot commented Dec 9, 2023

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants