Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching feeds for the first time only shows the last item in the feed. #14

Closed
anselm opened this issue Mar 5, 2017 · 12 comments
Closed

Comments

@anselm
Copy link

anselm commented Mar 5, 2017

I fetched a fresh copy of the repo and stripped the lists/ and config down to a single feed (lists/hn.opml) and let the system run. It reports to the console that it collects all the items. The rivers/ folder (once it is manufactured) only has the last item. The view only shows the last item correspondingly. It should show all the items.

I tried with both flAddItemsFromNewSubs set and unset in the config.json.

{
"maxRiverItems": 300,
"flAddItemsFromNewSubs": true,
"homePage": {
"panels": [
{
"title": "Hacker News",
"river": "hn.js"
}
]
}
}

@csenger
Copy link

csenger commented Jan 9, 2018

This is a bug in davereader. davereader has cache objects for different data. The accessor functions try to get the data from the cache object. If it does not exist they create a new data object, try to read old data stored on the filesystem with readFile() and add it to the cache.

Unfortunately a lot of the function used during readFile() are not synchronous

  • fs.readFile()
  • fs.exists()
  • fs.mkdir()

and the update of the arrays with the empty data object takes too long.

In this case we have a callback function that is called every time feedparser (or better the underlying xml parser) parsed an <item> node. The callback calles processFeedItem() which later tries to get data from two arrays:

  • addToRiver() -> addRiverItemToList() -> getRiverData() -> allTheRivers[]
  • addToRiver() -> addItemToFeedRiver() -> getFeedRiver() -> readFeedRiver() -> alltheFeedRivers[]

As parsing and all the logic in between is very fast, the first readFile() callback from getRiverData() finishes after a lot of items finished to be processed, and the newly generated river data objects overwrite each other. For readFeedRivher it is even worse as a deep directory structure is created with fs.mkdir().

A solution would be to read the river data for before the feed is parsed to make sure the river data object is in allMyRivers[]. Not sure what's the best place, but subscribeToFeed() adds the feed to the feedArray[]. This might be a good time to load the river too.:

	function subscribeToFeed (urlfeed, listname) {
		getFeedRiver(urlfeed, function() {}) // make sure we read the feed river into allTheFeedRivers[]
		getRiverData(listname)  // make sure we read the river data into allTheRivers[]

On the other hand this just bets on the chance that there is enough time for readFile() and the involved async callbacks to finish before the first <item> is parsed. I guess the problem is worse for the first run when the directories are created recursively. But the general problem also exists for each other restart of the node process when the data cache is read from the filesystem.

Without restructuring the logic a lot it would be easier to switch to the synchronous version of the involved fs methods.

I added a few log functions to getRiverData(), getFeedRiver() and the feedparser.on ("readable", ...) to illustrate the flow.

River5 v0.5.22 running on port 1337.

startWebSocketServer: websockets port is 1338
/start feedparser callback: Hardtail anyone? Das neue Cheaptrick!
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Hardtail anyone? Das neue Cheaptrick!
/start feedparser callback: Galerie: Die Fanes
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Galerie: Die Fanes
/start feedparser callback: Sennes als Freerider
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Sennes als Freerider
/start feedparser callback: Alutech Sennes 2.0 DH - Frage der Größe
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Alutech Sennes 2.0 DH - Frage der Größe
/start feedparser callback: Umbau Teibun auf 170mm
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Umbau Teibun auf 170mm
/start feedparser callback: Fragen zur Fanes
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Fragen zur Fanes
/start feedparser callback: ALUTECH´s in action....
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: ALUTECH´s in action....
/start feedparser callback: Tofane 2.0
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Tofane 2.0
/start feedparser callback: Bereifung Fanes
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Bereifung Fanes
/start feedparser callback: Wildsau Hardride - Dämpfer und mehr
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Wildsau Hardride - Dämpfer und mehr
/start feedparser callback: Pudel DH
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Pudel DH
/start feedparser callback: Wildsau Hardride Dämpferpositionen
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Wildsau Hardride Dämpferpositionen
/start feedparser callback: Aufbaubrett Alutech Cheap Trick
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Aufbaubrett Alutech Cheap Trick
/start feedparser callback: Alutech Teibun
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Alutech Teibun
/start feedparser callback: Alutech Pudel DH Bremsmomentabstützung
---> myTxtFeeds.txt not found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Alutech Pudel DH Bremsmomentabstützung
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
/start feedparser callback: Fanes Standorte
+++> myTxtFeeds.txt found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Fanes Standorte
/start feedparser callback: Welche Feder bei Sennes 1.0
+++> myTxtFeeds.txt found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Welche Feder bei Sennes 1.0
/start feedparser callback: Fanes 5.0 Lager/Knacken
+++> myTxtFeeds.txt found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Fanes 5.0 Lager/Knacken
/start feedparser callback: Galerie: Sennes
+++> myTxtFeeds.txt found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Galerie: Sennes
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
/start feedparser callback: Twinworks Vorbau nicht mehr im sortiment ?
+++> myTxtFeeds.txt found in allTheRivers
--->https://www.mtb-news.de/forum/f/alutech.130/index.rss not found in allTheRivers
/end feedparser callback: Twinworks Vorbau nicht mehr im sortiment ?
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
readFeed: error == Cannot read property 'title' of null
Add allTheRivers[myTxtFeeds.txt] cause readFile callback is called
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.
Add allTheFeedRivers[https://www.mtb-news.de/forum/f/alutech.130/index.rss] cause readFile callback finished.

@scripting
Copy link
Owner

Thanks for the report! :-)

It may take me a couple of days to swing around to this, but it sounds like you have found a serious problem. We'll get this addressed soon.

Dave

@scripting
Copy link
Owner

Carsten et al -- this is how I plan to address the problem.

http://scripting.com/2018/01/13/183433.html

Dave

@csenger
Copy link

csenger commented Jan 15, 2018

I'd opt for the boring option and create/load all files and directories at startup with sync methods. It's the simpler solution with less potential problems and easier to read code.

The code in the gist will have the some problem (through with much smaller timing windows). The readFile() callback might be executed during another call to readRiverFile() right before it tries to push to the queue array. So if line 11 in the async callback:

				delete riverQueues [f];

is executed right before line 19 in another call to readRiverFile():

		riverQueues [f].push (callback);

it breaks. A Promise might do the trick here, and there are packages that add fs calls with promise support. But that adds more unnecessary complexity IMO.

@scripting
Copy link
Owner

Hah! I saw that one tool. I'm glad you're paying attention. ;-)

I'll give it some more thought. There's another problem, wonder if you see it.

I can read all the river files at startup but what about newly subscribed-to feeds while River5 is running.

@scripting
Copy link
Owner

BTW, there's an even simpler approach than reading all the river files at startup.

Let's see what you think @csenger. :-)

I always thought the way readFeed works is not good. And that's what we're really paying the price for here, the fact that it returns items as a stream, instead of returning them all at once.

Instead, I could rewrite readFeed to read the feed, and return all the new items to a callback, in an array. It could have 1, 2, 3 or N items in it. Doesn't matter. Than I'd read the river file, add the items in a for loop, totally synchronous, and that's that. No bullshit. Use exactly the same caching method as in the current version of davereader.

I like this method because it can easily be implemented outside River5 and fully tested, and dropped in as a unit. I'm also pretty sure I have that code in exactly that form somewhere in my database of solved problems.

What do you think??

@scripting
Copy link
Owner

I do have the routine I describe, in this project --

https://github.com/scripting/feedParserDemo/blob/master/demo.js#L30

Dave

@scripting
Copy link
Owner

Here's a new local routine to go inside readFeed --

https://gist.github.com/scripting/c3eab6e60073678325593c20c1f9fece#file-readxmlfeed-js-L36

Dave

@scripting
Copy link
Owner

I have a new version running here that has both changes.

  1. It loads all the rivers that correspond to lists at startup.

  2. It loads the feed's river before it processes any new items in the feed.

I'm going to burn it in here, and if it all seems to work, release the new version.

@morthond
Copy link

Easy question - Is there a functional/unit test that proves it works or not?

@scripting
Copy link
Owner

@morthond -- no there is no such thing. If you want to create one that would be fantastic. However it's probably going to be rough, because you'd have to simulate feeds with more than one new item in them. And that would involve having a standard data folder and lists folder that it could run from.

@scripting
Copy link
Owner

scripting commented Jan 17, 2018

New version of reader project, and new release of davereader NPM package, v0.6.2.

Change notes here --

https://github.com/scripting/reader#v062----11718-by-dw

Here's link to the history for these changes. They really were quite simple and straightforward, not all glamorous and elegant as the one I first latched onto. ;-)

Thanks @csenger for the help finding this problem and figuring out a solution.

Dave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants