Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Element selector returning empty value #194

Closed
ap0 opened this issue Nov 27, 2018 · 9 comments
Closed

Element selector returning empty value #194

ap0 opened this issue Nov 27, 2018 · 9 comments
Labels
area/drivers/cdp Cdp driver type/bug Something isn't working

Comments

@ap0
Copy link
Contributor

ap0 commented Nov 27, 2018

Describe the bug
After updating ferret to the latest version, my scraper code no longer works. Both ELEMENTS and ELEMENT are returning an empty result. Reverting to a previous revision I was using (b3e0a68cd06e4ff577e0dc750aa73533b778ae74) still works.

To Reproduce

Using this sample HTML (extracted from the page that's causing issues): https://gist.github.com/ap0/67f6549afb8ac6cc2c4989525750808b

If I run this code:

LET mlswebsite = DOCUMENT('file:///Users/adam/test.html', true)
LET normalIds = (
	FOR tr IN ELEMENTS(mlswebsite, '#listings_table > tbody > tr')
		LET elem = ELEMENT(tr, 'td > input')
		RETURN elem.value
)

RETURN {
	normalIds
}

... using an old revision (b3e0a68), I get the expected output:

INFO 2018-11-27T20:53:48Z {"normalIds":["068728","068728","816410","52024413","698690","210583","049700","826394","354369","135911","700285","557242","278832","357701","313034","959368","703500","842750","777175","378061","072489","383005","843393","59912263","464535","229710","230550","767964","758862","944384","025449","010245","844935","038760","013450","124139","211145","758761","448667","488966"]}

But if I use the latest version, I get this:

INFO 2018-11-27T20:55:54Z {"normalIds":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null]}

However, if I call ELEMENTS_COUNT(mlswebsite, '#listings_table > tbody > tr'), I get the expected element count of 40.

Expected behavior
Expected to see the IDs in the first array (as shown above)

Desktop (please complete the following information):

  • MacOS Mojave / CentOS
  • Chrome 70.0.3538.110
@ziflex ziflex added type/bug Something isn't working area/drivers/cdp Cdp driver labels Nov 27, 2018
@ziflex
Copy link
Member

ziflex commented Nov 27, 2018

Hey, thanks for the report!
By other word, it works in 0.5.0?

@ap0
Copy link
Contributor Author

ap0 commented Nov 27, 2018

Yeah, I had been using latest master ones -- looks like it's 0.5.0.

@3timeslazy
Copy link
Member

@ap0 you can find the version just by typing ferret and look at the first line.

@3timeslazy
Copy link
Member

@ap0 can't reproduce the bug.
I use

LET mlswebsite = DOCUMENT('http://localhost:8000/error.html', true)
LET normalIds = (
	FOR tr IN ELEMENTS(mlswebsite, '#listings_table > tbody > tr')
		LET elem = ELEMENT(tr, 'td > input')
		RETURN elem.value
)

RETURN {
	normalIds
}

where http://localhost:8000/error.html is your html file.
Outputs:
screenshot 2018-11-28 at 01 05 00

@ap0
Copy link
Contributor Author

ap0 commented Nov 27, 2018

I'm running it embedded in an app, don't know if that makes a difference:

const scratchScraper = `
LET mlswebsite = DOCUMENT('file:///Users/adam/test.html', true)
WAIT_ELEMENT(mlswebsite, '#listings_table')
LET normalIds = (
	FOR tr IN ELEMENTS(mlswebsite, '#listings_table > tbody > tr')
		LET elem = ELEMENT(tr, 'td > input')
		RETURN elem.value
)

RETURN {
	normalIds
}
`

func main() {

	comp := compiler.New()
	program, err := comp.Compile(scratchScraper)
	if err != nil {
		log.Fatal(err.Error())
	}

	log.Infof("Running...")
	out, err := program.Run(context.Background(),
		runtime.WithBrowser("http://0.0.0.0:9222"),
		runtime.WithUserAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"))
	if err != nil {
		log.Fatal(err.Error())
	}

	log.Info(string(out))
}

Output:

INFO 2018-11-27T22:09:13Z Running...
{"level":"debug","id":"c6b6637c-0792-4843-a20d-26d394a71865","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36","message":"using User-Agent"}
INFO 2018-11-27T22:09:16Z {"normalIds":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null]}

@ziflex
Copy link
Member

ziflex commented Nov 28, 2018

@ap0 I have managed to reproduce it.

@ziflex
Copy link
Member

ziflex commented Nov 28, 2018

It was a side effect of a big refactoring :(

ziflex added a commit that referenced this issue Nov 28, 2018
* Fixes

* Fixed path to HTMLElement.value
@ziflex
Copy link
Member

ziflex commented Nov 28, 2018

Now it should work.

@ap0
Copy link
Contributor Author

ap0 commented Nov 28, 2018

Thanks!

3timeslazy pushed a commit to 3timeslazy/ferret that referenced this issue Apr 10, 2019
* Fixes

* Fixed path to HTMLElement.value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/drivers/cdp Cdp driver type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants