Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix same link in different file error #26

Merged
merged 2 commits into from
May 14, 2021

Conversation

saswatamcode
Copy link
Collaborator

Currently, if there are same links but in different files (come under same file glob passed to mdox) like here and here, mdox throws the error below,

remote link https://slack.cncf.io/: URL already visited

Even though the link is valid.
This PR fixes this and adds a test case for the same.

@saswatamcode saswatamcode requested a review from bwplotka May 13, 2021 13:12
@saswatamcode saswatamcode self-assigned this May 13, 2021
@saswatamcode
Copy link
Collaborator Author

Linter failure seems to be due to goreportcard.com being down.

@bwplotka
Copy link
Owner

You should be able to retry @saswatamcode since you are maintainer now!

@bwplotka
Copy link
Owner

Hm, do you know why this error was created in the first place? (:

@saswatamcode
Copy link
Collaborator Author

I think this error was created to improve performance and avoid duplicate requests in colly. But here it seems to cause this issue...is there a better way to handle this other than using AllowURLRevisit? (:

Also, goreportcard seems to still be down so check would fail if I rerun...

Signed-off-by: Saswata Mukherjee <[email protected]>
Copy link
Owner

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, one suggestion, otherwise LGTM (:

v.c.OnRequest(func(request *colly.Request) {
v.rMu.Lock()
defer v.rMu.Unlock()
request.Ctx.Put("originalURL", request.URL.String())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Suggestion: Handcrafted strings like this can be prone to errors. Use constant to reuse such key. Also it does not need to be string to be key. It's recommended to use empty struct (less mem allocated) e.g

var originalURLKey = struct{}{}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Get and Put methods seem to only take in strings as highlighted here. Should originalURLKey be passed in using fmt.Sprintf here? But that maybe wouldn’t help with memory as that would be the same as using a string…

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, this is something library specific not Go context.. ok, let's do constant at least

Signed-off-by: Saswata Mukherjee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants