Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prefetch_mode implementation #68

Closed
wants to merge 2 commits into from

Conversation

seroperson
Copy link

@seroperson seroperson commented Aug 22, 2023

Hello! Many thanks for your plugin. I want to use it to establish sync between my notion and jekyll but I'm really missing feature implemented within this PR.

Default behavior mostly works ok but it also has some pitfalls such as depending on Notion availablility, API access and internet availability. With prefetch_mode you can setup the sync so all your content will be stored under the git control.

To setup my sync, I'm planning on scheduling just one more CI job which will fetch changes from Notion and git commit/git push if any.

@emoriarty emoriarty added the enhancement New feature or request label Aug 26, 2023
@emoriarty
Copy link
Owner

emoriarty commented Aug 31, 2023

Hello @seroperson, I'm struggling a bit to have some quality time to check properly your proposal.

Anyway, having a quick look I like the work you did but I have two thoughts:

  • I've always thought more of having a cache instead of creating a specific command (which can override or be overridden by any another plugin with the same command name),
  • and I miss some tests (I recognize the test suite is not the best 😅)

I'll let you know more when I could test it and play with it.

By the way, thank you very much for your contribution. It is much appreciated ❤️.

@seroperson
Copy link
Author

Hello! I will add tests soon, just missed their existence at all.
Cache is good, but still it does not store content under the git control. Removing a database, loosing an account, moving to other "cms" etc leads in data loss. But when storing everything under the git, it really becomes a permanent part of website.

P.S. prefetch command sounds too general, maybe we should rename it to something else.

@emoriarty
Copy link
Owner

emoriarty commented Sep 2, 2023

Hello @seroperson ,

Before adding a new feature like this one I'd like give it a try to the cache option. I've been putting it off until you suggested the prefetch feature. Then I remember about the vcr gem and how easy can be setup.

Here you can find the PR with the changes for the cache. Please, be free to test it in your project:

gem 'jekyll-notion', git: 'https://github.com/emoriarty/jekyll-notion.git', branch: 'feat_cache-notion'

What I like about this change is that little code is modified. So less chance of introducing errors and less maintenance 😉.

Another highlight is that each notion request is stored in a specific file with the name of the notion resource ID, so when you want to clear the cache for a file, just look for that notion ID in the folder cache and remove the file.

For the CI, you can keep the cache folder as long as you want. Or if you prefer, you can store it in your repo.

Let me know if it fits for you.

Cheers 🍻 !

@emoriarty emoriarty added the wip label Sep 2, 2023
@seroperson
Copy link
Author

Hello! I think it is good thing to add but still it is not the plain independent .md files inside your repo 🥲 With plain .md files I can edit them without notion and without this plugin at all. With cache files I can't do anything, they just make the build faster.

@emoriarty
Copy link
Owner

Well, I don't think notion is going to go down tomorrow. So, by now you can still keep editing your files in notion or keep using your fork to download the document and edit them locally if it is what you prefer. An the end, it is a matter of preference. Me, I prefer to edit the documents in Notion than in markdown files.

Therefore, I'm not going to prioritize this feature by now. I think it is more interesting having a proper cache. And right after, I'm going to refactor the test suite to use the vcr gem instead to be able to update the fixture documents easily. And possibly rework or kill the fetch_on_watch feature.

That said, I think the prefetch feature is still interesting, but it requires a little more work to be done on it. For example, to make it much more convenient, it should not be necessary to add the prefetch_mode option in the configuration. It should be enough to run the command to download and create the markdown files. In fact, I think the name of the feature is more fetch than a prefetch. The document is really downloaded. So the command should be something like jekyll fetch_notion or fetch_docs. I still don't know what should be the name of the command.

So what I'm proposing is to keep working on it and eventually the feature will be ready for release.

Thanks again for your work and your time 🤟 and I insist, your work is much appreciated.

@seroperson
Copy link
Author

Yea, I agree that it is more like personal preferences. Both approaches have their own pros and cons.

prefetch_mode option exists here to disable the fetching during the build. Without such option we would add the articles twice: once by calling command + git-commit and once during the build. I'm not sure how to not duplicate articles without using the configuration option.

I will revise PR a little bit later. If you have some wishes/advices about this feature, feel free to mention them, I will check.

@seroperson
Copy link
Author

Hello @emoriarty! I added some tests, rebased the code and renamed command to fetch_notion. Feels like it was all what I wanted to do here, but if you have some wishes, I'm open to discuss.

@seroperson
Copy link
Author

Hello! Recently I also managed to solve problem with short-living Notion URLs for images which were uploaded directly via UI. I'm unsure about how to deal with it using default approach, but with git-based it works perfectly. I also written a post about it: https://seroperson.me/2023/11/16/notion-jekyll-images-synchronization/ 🙂

@emoriarty
Copy link
Owner

emoriarty commented Nov 18, 2023

Hello @seroperson, the first thing is thanking you for your contribution. Unfortunately, I've been having second thoughts regarding this feature. But first, let's see what I think what is missed in the current PR:

  1. Pages: For what I understand in the code, the fetch_notion command only works for notion databases, right? So, what about single pages? Notion documents are also allowed to be downloaded as single pages.
  2. Data: Have you considered what happens when a database is marked as a data instead to be a normal collection? The files are going to be created locally but never used as expected. Basically, the data option is ignored, which is kind of confusing.
  3. fetch_mode: Why there's need for this? It should be enough by just running the command. Pages are downloaded without no setup. The command must be responsible preventing the default behavior.

Anyway, these two points are of little importance. They can be tackled with agreement and a bit of code. My major concern regarding this feature is not being able to keep the original document as the source of truth.

According to the current changes, executing the fetch_notion will download the Notion documents for the 1st time. But the 2nd time is going to be downloaded but not written locally because of this condition.

next if file_exists?(make_path(page))

This means that, once the documents are downloaded, this function considers any local file as the new source of truth. I wonder why anyone would want to create a document in Notion just to download it once. Subsequent changes must be made locally. Therefore, it doesn't make much sense to use Notion to create blog posts. Any changes to an old post in Notion will never be taken into account. Unless the document is deleted locally. But in case there have been local changes, they will be lost.

Now let's assume the opposite. Notion is the main source of truth. What is the difference between using this function and the current caching mechanism? Any local changes will be overwritten at each execution of fetch_notion. So what is the gain of having local files instead of copies of downloaded documents (cached) that are transformed into notion pages effortlessly on each build?

If the main concern of this feature is availability, you can always keep the cache in your repository. Every build in the CI/CD will not depend on Notion.

But if you just want to download the documents from Notion and be able to manipulate them at will, then it seems more pertinent to create a new command based on this plugin. Or, even download the document page as Markdown from Notion using the Export option.

notion_export

In conclusion, I don't think this feature belongs here. The main purpose of jekyll-notion is to use Notion as a kind of CRM. Which means that Notion should remain the only source of truth. And this feature does not guarantee this purpose.

Anyway, if you consider creating a new tool based in jekyll-notion, I'll be glad to link it here with a little description for those who prefers to download the files instead of using Notion as a CRM.

Again, thanks for your contribution. It's really appreciated ❤️. It has made me think deeply in the vision of jekyll-notion.

Let me know your thoughts.

@seroperson
Copy link
Author

Hello! Thank you for your detailed response!

  • About fetchng pages and data: yea, I can add them to this PR.
  • About fetch_mode: I added it to not run default Notion synchronization during the build. Is there any other way to skip it without configuration? I think we need to separate approaches somehow to not make them conflict. I'm unsure how to do it correctly here.
  • About next if file_exists?(make_path(page)): this check exists just only for default approach. When running fetch_notion there is no such check, so local changes always will be overwritten.
  • About difference with cache: if we are talking about final results, there are no any differences, the results are the same. But ideologically, jekyll builds a website from .md sources and with command-git-based approach, this "ideology" is not violated. We still have all the content git-tracked and all the content is presented as plain .md files. The difference with cache here is that the cache is a some kind of "different entity" than plain .md files. I feel like default approach ideologically is not "jekyll-way" besides of other points.
  • About separate project: just it sounds creepy to me to maintain a fork because I'm not an experienced ruby developer and probably it would be better to hold this code here, but I can think about it if you don't see this feature here.

🫡

@emoriarty emoriarty added wontfix This will not be worked on and removed wip labels Dec 21, 2023
@emoriarty
Copy link
Owner

Hello @seroperson, I've found that there's already a gem that allows to download notion pages as markdown documents: jekyll-notion-import.
Perhaps you can give it a try for your case.

@emoriarty emoriarty mentioned this pull request Dec 30, 2023
@seroperson
Copy link
Author

Hello @emoriarty! I have created the new plugin, jekyll-fetch-notion, which is a fork of this project, but with this different kind of synchronization. I have changed some things, removed some features as for now, but in general much code is the same. I also left all the copyright things and links to the original project, so people can find the original author 🙂

Regarding jekyll-notion-import, I believe this project is not as mature as jekyll-notion (jekyll-fetch-notion) + notion_to_md. It's likely to be error-prone, and I'm not sure it's extensible enough to meet all needs.

@emoriarty
Copy link
Owner

I'm closing this PR because this feature resides in its own repository: jekyll-fetch-notion.

@emoriarty emoriarty closed this Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants