First, we use PowerAutomate to check each RSS feed periodically.
Then, we filter out just the new papers, and for bioRxiv those that match our search terms, from each feed and remove any duplicates.
Finally, we post the new papers over the next time period, evenly spaced.
NB I hate power automate, and you may come to hate it too. It's like programming without access to anything useful. It's worse than the lego drag and drop programming thing that my kids and I use on the iPad.
NEVERHTELESS Here we're going to make a 'Flow'. Take a deep breath...
- Go to Power Automate.
- Click on "Create" > "Scheduled cloud flow."
- Name your flow "literature_bot_phypapers" or whatever the hell you like
- Set it to run every Day and click "Create."
NB: Pubmed gets updated once every 24 hours, and the rest of this flow assumes you only check it once every 24 hours. If you check it more often you'll get a lot of duplicate posts.
We'll set the variables that different people will want to change right at the top. This will make it easier to adapt this to different RSS feeds and/or people.
-
Click on "+" and "Add an Action", then search for the "Initialize variable" action and select it. Set it up as follows:
- Name:
BlueskyUsername
(e.g.phypapers.bsky.social
) - Type: String
- Value: Enter your Bluesky username.
- Name:
-
Add another "Initialize variable" action and set it up as follows:
- Name:
BlueskyAPIPassword
(should be something with alphanumeric characters in the formxxxx-xxxx-xxxx-xxxx
) - Type: String
- Value: Enter your Bluesky API password.
- Name:
-
Set up the list of RSS feeds that we have to manually search for relevant papers:
- Add an "Initialize variable" action.
- Name:
FeedURLs
- Type: Array
- Value:
[ "https://ecoevorxiv.org/rss/preprints/", "http://connect.biorxiv.org/biorxiv_xml.php?subject=animal_behavior", "http://connect.biorxiv.org/biorxiv_xml.php?subject=biochemistry", "http://connect.biorxiv.org/biorxiv_xml.php?subject=bioinformatics", "http://connect.biorxiv.org/biorxiv_xml.php?subject=biophysics", "http://connect.biorxiv.org/biorxiv_xml.php?subject=cancer_biology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=cell_biology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=developmental_biology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=ecology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=evolutionary_biology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=genetics", "http://connect.biorxiv.org/biorxiv_xml.php?subject=genomics", "http://connect.biorxiv.org/biorxiv_xml.php?subject=immunology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=microbiology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=molecular_biology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=neuroscience", "http://connect.biorxiv.org/biorxiv_xml.php?subject=paleontology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=pathology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=pharmacology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=physiology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=plant_biology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=scientific_communication_and_education", "http://connect.biorxiv.org/biorxiv_xml.php?subject=synthetic_biology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=systems_biology", "http://connect.biorxiv.org/biorxiv_xml.php?subject=zoology" ]
-
Set up the list of search terms (we'll only keep bioRxiv papers that have these terms in the title/abstract):
- Add an "Initialize variable" action.
- Name:
Keywords
- Type: Array
- Value:
[ "phylogenetics", "phylogenomics", "phylogenetic analysis", "phylogenomic analysis" ]
These are my search terms. Obviously you'll (probably...) want different ones.
- Set up the list of other RSS feeds (ones where you can get things that already match your search terms). I combine a bunch of different pubmed searches, and two from arXiv. Don't worry, we remove duplicates later.
- Add an "Initialize variable" action.
- Name:
OtherFeedURLs
- Type: Array
- Value:
[ "https://pubmed.ncbi.nlm.nih.gov/rss/search/1tYbWOIP0tIVreX9rPCvdGmmbxHJobuBntOy3VyMFivsPJcEG1/?limit=100&utm_campaign=pubmed-2&fc=20240528181849", "https://pubmed.ncbi.nlm.nih.gov/rss/search/1bUrbZONKdKY6mFb4tOeokyXplUngAStuFKAcG88ZfRCNqFE5a/?limit=100&utm_campaign=pubmed-2&fc=20240528181921", "https://pubmed.ncbi.nlm.nih.gov/rss/search/1T5FW5K6kI71ia_6eneQzMtEXpGBLaOr06kN1qxSU80qPUWQcW/?limit=100&utm_campaign=pubmed-2&fc=20240528182102", "https://pubmed.ncbi.nlm.nih.gov/rss/search/1-ONS2P_EKb8HyuP5cSNsVIPVmKKl4rbk16StHDuvXiZWQv9Em/?limit=100&utm_campaign=pubmed-2&fc=20240528182114", "https://pubmed.ncbi.nlm.nih.gov/rss/search/1bAXfGTh08tVkaeuklkzsn7cdc7iJJPE6uvrK1L3guOpfhwkF_/?limit=100&utm_campaign=pubmed-2&fc=20240528182223", "https://pubmed.ncbi.nlm.nih.gov/rss/search/1TyHVUJDxNJTq_goUvgFwCZllkgW6UrIpAskwDT-8mQJ3bn9cD/?limit=100&utm_campaign=pubmed-2&fc=20240528182242", "https://pubmed.ncbi.nlm.nih.gov/rss/search/1NwSQ1kPYoZ_BGXTxnE9MqKYXgYR6mL9HahsJL_YZ-77lpmspk/?limit=100&utm_campaign=pubmed-2&fc=20240528182259", "https://pubmed.ncbi.nlm.nih.gov/rss/search/1DSoZAVEXfx-7I2bn7qqJUrjCjP9uo4KuCG4G0VbH3DyAAL9Su/?limit=100&utm_campaign=pubmed-2&fc=20240528182329", "https://pubmed.ncbi.nlm.nih.gov/rss/search/1bYz7DSbRS5oPC2jrkUeb9exioZTLpMlGljCvk088lBI7qagvL/?limit=100&utm_campaign=pubmed-2&fc=20240528182432", "https://export.arxiv.org/api/query?search_query=all:phylogen*&start=0&max_results=100&sortBy=lastUpdatedDate&sortOrder=descending", "https://export.arxiv.org/api/query?search_query=all:%22ancestral%20recombination%20graph%22&start=0&max_results=100&sortBy=lastUpdatedDate&sortOrder=descending" ]
Note that you can add as many feeds as you like here, as long as everything in those feeds is what you want to post.
-
Add an "HTTP" action and call it
GetAccessToken
- Method: POST
- URI:
https://bsky.social/xrpc/com.atproto.server.createSession
- Headers:
- Content-Type: application/json
- Body:
{ "identifier": "@{variables('BlueskyUsername')}", "password": "@{variables('BlueskyAPIPassword')}" }
-
Add a "Parse JSON" action.
- Content: click the lightning bolt and choose
body
ofGetAccessToken
- Schema:
{ "type": "object", "properties": { "accessJwt": { "type": "string" }, "refreshJwt": { "type": "string" } } }
- Content: click the lightning bolt and choose
-
Add an "Initialize variable" action and call it
AccessToken
- Type: String
- Value: use the lightning bolt and select
Body acessJWT
from Parse JSON
-
Add an "Initialize variable" action and call it
RefreshToken
- Type: String
- Value: use the lightning bolt and select
Body refreshJWT
from Parse JSON
We need these tokens later to post to Bluesky
We only want papers from the last 24 hours
- Add an
Initialize Variable
, call itAllPapers
, chooseArray
and the value should be[]
(we'll use this to store all the papers we get) - Add an
Initialize Variable
, call itOldAllPapers
, chooseArray
and the value should be[]
(we'll use this as a workaround to update ourAllPapers
list one feed at a time; one more reason to hate Power Automate) - Add an
Apply to Each
action and call itLoopThroughFeeds
, set the input to theFeedURLs
variable - Add a
Scope
action, and call itFetchAndFilterFeed
(this is so we don't fall over if one of the bioRxiv feeds doesn't work, which is often) - Add a
Scope
action, and call itErrorHandler
- Go to the settings of the "ErrorHandler" scope, and expand the drop down menu
- Check the boxes for "has failed" and "has timed out", and uncheck the others.
- Add a
List all RSS Feed Items
action into theFetchAndFilterFeed
loop- Use the lightning bolt to set the
RSS Feed URL
toCurrent Item
fromLoopThroughFeeds
(i.e. we're just getting each URL in turn)
- Use the lightning bolt to set the
- Add a
Filter Array
action, callFilterArray
, we'll use this to get papers from the last day only (to avoid duplicates day to day):- Use the lightning bolt to set the
From
tobody
fromList all RSS Feed Items
- On the left side of the query, use the blue
fx
to add this code:formatDateTime(item()?['publishDate'], 'yyyy-MM-dd')
- On the right side of the query, use the blue
fx
to add this code:formatDateTime(addDays(utcNow(), -1), 'yyyy-MM-dd')
- Set the operator to
is greater or equal to
- Use the lightning bolt to set the
- Add a
Set Variable
action, call itUpdateOldAllPapers
, chooseOldAllPapers
from the dropdown, and use the lightning bolt to set the value to theAllPapers
variable - Add a
Set Variable
action, call itUpdateAllPapers
, chooseAllPapers
from the dropdown, and use the bluefx
to enter the following code: union(variables('OldAllPapers'), body('FilterArray')). This adds the new papers from the feed we're working on to ourAllPapers
list (without duplicates).
-
Directly after the
LoopThroughFeeds
loop, add an "Initialize variable" action.- Name:
FilteredPapers
- Type: Array
- Value:
[]
- Name:
-
Inside this loop, add another
Apply to each
loop forAllPapers
.- Name:
LoopThroughPapers
- Name:
-
Add an
Apply to each
loop for theKeywords
array.- Name:
LoopThroughKeywords
- Name:
-
Inside the
LoopThroughKeywords
loop, add a "Condition" action.- Condition:
- Left: (use the blue
fx
to paste this into the box)or( contains(toLower(items('LoopThroughPapers')['title']), toLower(items('LoopThroughKeywords'))), contains(toLower(items('LoopThroughPapers')['summary']), toLower(items('LoopThroughKeywords'))) )
- Operator:
is equal to
- Right:
true
- Left: (use the blue
- Condition:
-
If the condition is true, add an "Append to array variable" action.
- Name:
FilteredPapers
- Value:
items('LoopThroughPapers')
- Name:
These are the ones with search terms built in, so we don't need to do as much work here.
- Add an
Initialize Variable
, call itOldFilteredPapers
, chooseArray
and the value should be[]
- Add an
Apply to Each
action, set it to loop over theOtherFeedURLs
variable - Add a
Scope
action, and call itFetchAndFilterFeed2
- Add a
Scope
action, and call itErrorHandler2
- Go to the settings of the "ErrorHandler" scope, and expand the drop down menu
- Check the boxes for "has failed" and "has timed out", and uncheck the others.
- Add a
List all RSS Feed Items
action into theFetchAndFilterFeed
loop, call itList all RSS Feed Items2
- Use the lightning bolt to set the
RSS Feed URL
toCurrent Item
fromLoopThroughOtherRSSFeeds
(i.e. we're just getting each URL in turn)
- Use the lightning bolt to set the
- Add a
Filter Array
action, callFilterArray2
, we'll use this to get papers from the last day only (to avoid duplicates day to day):- Use the lightning bolt to set the
From
tobody
fromList all RSS Feed Items2
- On the left side of the query, use the blue
fx
to add this code:formatDateTime(item()?['publishDate'], 'yyyy-MM-dd')
- On the right side of the query, use the blue
fx
to add this code:formatDateTime(addDays(utcNow(), -1), 'yyyy-MM-dd')
- Set the operator to
is greater or equal to
- Use the lightning bolt to set the
- Add a
Compose
action, call itRecentPapers
, use the lightning bolt to choose the 'body' ofFilterArray2
- Add a
Compose
action, call itAddNewPapersToFilteredPapers
, use the bluefx
to enter the following codeunion(variables('FilteredPapers'), outputs('RecentPapers'))
- Add a
Set Variable
action, call itSetFilteredPapers
, chooseFilteredPapers
from the dropdown, and use the lightning bolt to choose theoutput
ofAddNewPapersToFilteredPapers
.
- Add a "Compose" action and call it
RemoveDuplicates
. Use the bluefx
to enter the following code:union(variables('FilteredPapers'), variables('FilteredPapers'))
- Add a Set Variable Action and call it
Set Filtered Papers
, set the Name toFilteredPapers
, use the lightning bolt to set the Value to theoutput
ofRemoveDuplicates
- Add a "Compose" action, call it
PostCount
, and use the bluefx
to enter teh codelength(variables('FilteredPapers'))
- Add a "Compose" action, call it
MinutesBetweenPosts
and use the bluefx
and enter the codediv(1380, outputs('PostCount'))
. This will allow us to space the posts out over ~23 hours.
-
Add an "Apply to each" action.
- Value: use the lightning bolt to select the FilterArray
body
- Name:
PostToBluesky
- Value: use the lightning bolt to select the FilterArray
-
Inside the "Apply to each" action, add a "Compose" action.
- Name:
CurrentPaper
- Inputs: use the lightning bolt to select the PostToBluesky
Current Item
- Name:
-
Next, add a "Compose" action to get the title.
- Name:
Title
- Inputs: select the blue
fx
and in the code box putitem()?['title']
- Name:
-
Next, add a "Compose" action to strip HTML tags and newline characters from the title
- Name:
Title
- Inputs: select the blue
fx
and in the code box put the following code:
- Name:
replace(
join(
xpath(
xml(
concat(
'<root>',
replace(replace(replace(outputs('Title'), '&', '&'), '<', '<'), '>', '>'),
'</root>'
)
),
'//text()'
),
''
),
'\n',
''
)
-
Next, add a "Compose" action to truncate the title if it's longer than 260 characters
- Name:
Title
- Inputs: select the blue
fx
and in the code box putif(greater(length(outputs('CleanTitle')), 260), substring(outputs('CleanTitle'), 0, 260), outputs('CleanTitle'))
- Name:
-
Next, add a "Compose" action to get the link.
- Name:
Link
- Inputs: select the blue
fx
and in the code box putitem()?['primaryLink']
- Name:
-
Next, add a "Compose" action to take the crud off the link.
- Name:
CleanLink
- Inputs: select the blue
fx
and in the code box putsplit(outputs('Link'), '?')[0]
- Name:
- Inside the "PostToBluesky" loop, add a "Compose" action after the
ShortTitle
andCleanLink
actions.- Name:
PostContent
- Inputs:
"@{concat(outputs('ShortTitle'),' ',outputs('CleanLink'))}"
- Name:
Access tokens don't last for long, so we need to refresh it each time we post
-
Inside the "PostToBluesky" loop, add an "HTTP" action.
- Name:
RefreshAccessToken
- Method: POST
- URI:
https://public.api.bsky.app/xrpc/com.atproto.server.refreshSession
- Headers:
- Accept: application/json
- Authorization:
Bearer @{variables('RefreshToken')}
- Name:
-
Add a "Parse JSON" action.
- Name:
ParseRefreshResponse
- Content: click the lightning bolt and choose
body
ofRefreshAccessToken
- Schema:
{ "type": "object", "properties": { "accessJwt": { "type": "string" } } }
- Name:
- Inside the "PostToBluesky" loop, add an "HTTP" action after the
PostContent
action.- Name:
PostToBlueskyAPI
- Method: POST
- URI:
https://bsky.social/xrpc/com.atproto.repo.createRecord
- Headers:
- Content-Type: application/json
- Authorization:
Bearer @{variables('AccessToken')}
- Body:
{ "collection": "app.bsky.feed.post", "repo": "@{variables('BlueskyUsername')}", "record": { "$type": "app.bsky.feed.post", "text": "@{outputs('PostContent')}", "facets": [ { "index": { "byteStart": @{sub(outputs('GetPostLength'), outputs('GetLinkLength'))}, "byteEnd": @{outputs('GetPostLength')} }, "features": [ { "$type": "app.bsky.richtext.facet#link", "uri": "@{outputs('CleanLink')}" } ] } ], "createdAt": "@{utcNow()}" } }
- Name:
- Add a "Set variable" action to update the access token.
- Name:
AccessToken
- Value: click the lightning bolt and choose
body accessJWT
ofParseRefreshResponse
- Name:
- Add a 'Delay` action
- Select the blue lightning bolt and choose the
Outputs
of theMinutesBetweenPosts
variable
This will make the bot wait, so the papers trickle out over ~23 hours.
If you thought these instructions were long and tiresome, I cannot tell you how much longer and tiresomer they were to figure out!
- Edit the PostCount to have the following code:
length(variables('FilteredPapers'))
- Click on the
PostToBluesky
loop, remove the output, and replace it withFilteredPapers