Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introducing 2FA scrapers infrastructure + OneZero experimental scraper #760

Merged
merged 9 commits into from
Apr 8, 2023

Conversation

orzarchi
Copy link
Contributor

@orzarchi orzarchi commented Feb 25, 2023

Hi there!
Hopefully this is the first PR in the era of 2FA scrapers.
I've loosely followed the guidelines set up here.

  1. I've created a new base scraper - BaseTwoFactorAuthScraper.
  2. Since the original base-scraper file was getting a bit long, I've cleaned up a bit and moved error related types to errors.ts, and public facing interfaces (e.g. ScraperOptions, ScraperScrapingResult) to interface.ts
  3. I've created the first 2FA scraper - OneZero. It's not perfect, but I've managed to scrape my personal account successfully. See the new documentation in the README to learn more about it.

Some limitations about OneZero:

  • I've supplied a way to provide a persistent OTP token, but it's limited - You can get one from OneZero that lasts 45 minutes, and it should be possible to refresh it indefinitely, but I've still not found the refresh graphql calls.
  • Using OneZero scraper in an automated scripts will probably log you out from whatever mobile devices you are connected to - their servers don't appreciate simultaneous logins. Perhaps this can be fixed in the future by supplying more headers and persisting cookies.

closes #765

@esakal
Copy link
Collaborator

esakal commented Mar 5, 2023

@orzarchi thank you for this contribution 💯 . I'll free sometime soon to review and play with the PR.

@eshaham @baruchiro @brafdlog as this is a major change in the design / capabilities of the scrapers, you might want to review it as well?

@baruchiro
Copy link
Collaborator

I need to understand the interface changes more.

Meanwhile, I think you can continue with this change, it will not break our app because we will have to enable the new bank manually after getting your version.

CC: @brafdlog

@orzarchi
Copy link
Contributor Author

orzarchi commented Mar 21, 2023

Any news?
@baruchiro for the record, there aren't any interface changes, except exporting more types.
And even that can be reverted if necessary

@brafdlog
Copy link
Contributor

Hi @orzarchi
I (and I assume baruch as well) am in a very busy period.
I suggest you don't wait for us. There shouldn't be issues based on your description, and we will update if there is when we update to the next version of israeli-bank-scrapers.

Copy link
Collaborator

@erikash erikash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work :)
Please see my comments

return result.data as Promise<TResult>;
}

export async function fetchGetWithinPage<TResult>(page: Page, url: string): Promise<TResult | null> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No await -> no need for this to be marked async

ScraperGetLongTermTwoFactorTokenResult, ScraperCredentials,
} from './interface';

export type ScraperLoginResult = ErrorResult | {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move to interface?



export class BaseTwoFactorAuthScraper<TTwoFactorCredentials> implements TwoFactorAuthScraper {
constructor(protected options: ScraperOptions) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

}

private async fetchPortfolioMovements(portfolio: Portfolio, startDate: Date): Promise<TransactionsAccount> {
const account = portfolio.accounts[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open a task for future multi account support

const movements = [];


while (!movements.length || new Date(movements[0].movementTimestamp) >= startDate) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use moment and not Date to avoid timezone issues

}
}

movements.sort((x, y) => new Date(x.movementTimestamp).valueOf() - new Date(y.movementTimestamp).valueOf());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here


movements.sort((x, y) => new Date(x.movementTimestamp).valueOf() - new Date(y.movementTimestamp).valueOf());

const matchingMovements = movements.filter((movement) => new Date(movement.movementTimestamp) >= startDate);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

@esakal
Copy link
Collaborator

esakal commented Mar 28, 2023

@orzarchi @erikash Hi,
Thanks again for this one, supporting otp is required as many services switch to OTP login.

TL;DR

Long time ago I suggested a modular architecture that would let us compose flows with atomic operations. But it was archived and we should assume for now that we will keep current architecture of the library.

We should favor features over infrastructure consistency as this library is in low maintenance but I believe we can easily modify the PR to use the existing scrapers while supporting OTP with the great work of @orzarchi.

I verified it with Max account. You should verify the One zero as well.

Please checkout changes here #765.

There are still the other code review comments but they are pretty much minor. Feel free to pull branch one-zero-with-existing-scrapers into your repo and create a PR from there if you prefer. you can also get diff here.

@orzarchi @erikash - wdyt?

Adjustments done in the PR

  1. use ScraperCredentials everywhere and adjust declaration to support otp
  2. move relevant parts from base-two-factor-auth-scraper into base-scraper
  3. delete base-two-factor-auth-scrape
  4. extend One zero scraper with BaseScraper

Motivation behind my suggestion

The new scraper BaseTwoFactorAuthScraper changes the current architecture of the library provided long time ago by @eshaham. I identify it in some places:

  1. It doesn't emit the same progress events like the scrapers BaseScraper and BaseScraperWithBrowser exposed by method onProgress
  2. Since one zero doesn't require puppeteer, the new scraper is similar to BaseScraper in the implementation. Once someone will need a an OTP scraper with puppeteer support we will need a fourth scraper for that.
  3. It has some assumptions on the scraper credentials although once compiled it is not guaranteed and might break at runtime

@erikash
Copy link
Collaborator

erikash commented Mar 28, 2023

Great insights @esakal
I agree it's cleaner - i guess that after a few more scrapers we might consider reintroducing base-two-factor-auth-scraper for reusing the login logic - but it's still too early, certainly for this PR.

I've make a small comment here:
#765 (comment)

@esakal : can you rephrase your second comment, I didn't understand it...
"Since one zero doesn't require puppeteer, the new scraper is similar to BaseScraper in the implementation. Once someone will need a an OTP scraper with puppeteer support we will need a fourth scraper for that."

@esakal
Copy link
Collaborator

esakal commented Mar 28, 2023

@erikash thanks :)

"Since one zero doesn't require puppeteer, the new scraper is similar to BaseScraper in the implementation. Once someone will need a an OTP scraper with puppeteer support we will need a fourth scraper for that".

Today we have two base scrapers:

  • base-scraper - used to scrape without puppeteer.
  • base-scraper-with-browser - extends base scraper with puppeteer scraping capabilities.

In the original pull request, @orzarchi introduced a new scraper that wasn't inherited from either of the existing ones. Since this scraper doesn't require puppeteer support, it doesn't use puppeteer. My comment was that if we keep all three base scrapers, we may need to provide a fourth base scraper in the future for scrapers that require both OTP and puppeteer. However, in the suggested modification, this issue is no longer a concern. I hope this clarifies things.

@orzarchi
Copy link
Contributor Author

hi @esakal - no problems, I'll fix @erikash's comments on top of your new PR with your changes.
Some thoughts about the discussion above:

  1. I think we all proved once again that composition > inheritance, and that eventually the common functions in base-scraper/base-scraper-with-browser should be extracted to their own classes, and used by all concrete scrapers :)
  2. I still feel like my approach with typing ScraperCredentials is better - if we define it per scraper, we can avoid jumping through hoops all over the codebase. Why? In the long term we can make the scraper factory return the concrete scraper that accepts concrete credentials instead of the Scraper interface (using something like this technique), and remove files like this as well as the code that tests it.
    What do you feel about keeping the generic Credentials argument as I did here?

@esakal
Copy link
Collaborator

esakal commented Mar 29, 2023

@orzarchi, I agree with you on both topics. Ideally, we should switch to a new composition-based architecture, but this will require active development work from developers to refactor all the scrapers. Unfortunately, this library is not currently being actively evolved by a group of developers; we are mostly focused on ongoing maintenance and limited by depending on developers that has actual access to each institute. This makes it challenging to change the architecture of the project.

However, if you do have a suggestion for a new architecture, testing it out on new scrapers could be a good way to gradually transition to it. For instance, One Zero might be a good candidate for this.

Regarding the credentials, you can certainly implement them. Just keep in mind that they will only work for those who initiate the scraper explicitly. Those who use the createScraper with Typescript might encounter runtime errors because the common credentials declaration is export declare type ScraperCredentials = Record<string, string>;

@orzarchi
Copy link
Contributor Author

orzarchi commented Apr 1, 2023

Hey @erikash and @esakal, I've incorporated all of your suggestions and code.
I've also implemented a compromise regrading the ScraperCredentials type - I've made each scraper define its own credentials, and made ScraperCredentials a big copied union of all of the different options.
This way we get more type safety everywhere at the cost of copied and pasted code. (This already flushed out some incorrect tests)
In the future we can make the scraper factory function return concrete scraper types and get rid of ScraperCredentials altogether.

@orzarchi orzarchi force-pushed the master branch 2 times, most recently from fa2fdee to db4c832 Compare April 1, 2023 15:08
Copy link
Collaborator

@esakal esakal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orzarchi this is awesome! I want to merge it this weekend. There are few minor comments. Please review and decide if to fix them or not. I'm ok with having it like that. Please check my comment about the ...extraHeaders.

Once you'll reply or commit fixes I'll merge it.

@orzarchi
Copy link
Contributor Author

orzarchi commented Apr 7, 2023

@orzarchi this is awesome! I want to merge it this weekend. There are few minor comments. Please review and decide if to fix them or not. I'm ok with having it like that. Please check my comment about the ...extraHeaders.

Once you'll reply or commit fixes I'll merge it.

Sure, go ahead and merge it 🤩

@esakal
Copy link
Collaborator

esakal commented Apr 8, 2023

I approved the PR, it will be merged soon. Thanks @orzarchi for your contribution

closes #765

@esakal esakal merged commit 2da370e into eshaham:master Apr 8, 2023
@github-actions
Copy link

🎉 This PR is included in version 3.7.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants