Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTA / DNA sequence search API, display results in frontend #239

Closed
26 of 27 tasks
Don-Isdale opened this issue May 7, 2021 · 8 comments
Closed
26 of 27 tasks

FASTA / DNA sequence search API, display results in frontend #239

Don-Isdale opened this issue May 7, 2021 · 8 comments
Assignees
Labels

Comments

@Don-Isdale
Copy link
Collaborator

Don-Isdale commented May 7, 2021

  • dataset tag on reference for genomes that have the blast db set up on the BE

    • flag for parent BlastDb. in dataset.tags[]. (a27ca03)
  • taking some fasta input, calling blast on the BE, and getting the result in FE (8dc9fc8)

  • view returned data as a table (52413e1)

    • which user then selects and send to map view to look at
      (? select entries to make a temporary dataset from)
      66d59ba sequence search : view added dataset
    • (initially use upload dataset but later perhaps non-persistent dataset)
      acd77ec : ... if addDataset import jsonFile.
      27e3a5d : add transient.js : with pushFeature(), pushData() ...
    • sometimes there will be many many lines of output
      we probably want to limit it somehow
      (resultRows )
      • output length limit (resultRows : a27ca03)

blast output columns are
query ID, subject ID, % identity, length of HSP (hit), # mismatches, # gaps, query start, query end, subject start, subject end, e-value, score, query length, subject length

branch : feature/backendBlast


various sub-items :

  • option to load as dataset (name), (a27ca03,

    • add as dataset (acd77ec), using the import json functions from import spreadsheet.
  • input field for parent, pass as param (a27ca03)

  • currently the result is not displayed in the table when it arrives; switching to the search tab and back again is sufficient to update the display and show the result in the table
    solved by adding a dependency on table (46f8c0c), instead of : x( try updateSettings({ data }) to trigger re-display; check when dataMatrix CP is evaluated)
    (d245841 : show empty table initially)

  • toggle out table to dialog d99cf5f, cdcdcf4: added tooltips.

  • collapse button, put table above the inputs (datasetName, parent, namespace) and button. 6714328

  • no need for min rows 20 because not input (a27ca03)

  • 💤 alternate icons for 2 search tabs

  • DNA Sequence Input -> FASTA input, e.g ..., refn URL.
    (placeholder : whitespace not trimmed) (a27ca03)

  • [1-2H/1H] fileName (dnaSequence) - make this unique to handle parallel requests (d56e232)

  • [2H/1H] wrap with task to avoid accidental repeat (984d5a1)

  • description for :
    Feature.remoteMethod('dnaSequenceSearch', {
    description: "Returns features by their level in the feature hierarchy" (a27ca03)

  • actually fasta (for blastn) :

done in 1e7c0e9 :

  • additional checking required for user error like not selecting a parent to search.
  • may add a search button in place of handling paste action and newline.
  • clear error message display at start of request
may not be an issue; deferred to later branch :
  • [1-2H] 0 values are not added to Feature.values

later:

  • search selector in GUI : fixed at Blast until other options added. Done : a27ca03, e373614

    • possibly a remaining GUI issue : parent drop-down reverts to 'None' (30d8dcd)
  • 💤 dataset (spreadsheet upload) can now use childProcess()

  • 💤 show a list of user's blastResults in search tab - with a delete action
    Could limit to e.g. 5 results, so users will remove unused results.


Extensions / for discussion

  • perhaps add a dataset tag for blast results, which might be useful in listing temporary datasets so the user has an easy way to clear them up
    (related : added tag 'transient' for datasets which are pushed to store in frontend but not persisted to server & database)

  • [1-3H/1H] combine feature search and blast search into a single panel, with each part initially collapsed
    (suggestion by J, meeting May27, decided meeting Jun04).
    related : Fixes/tweaks to existing features #216 (comment)
    4e9727e


Notes

💤 is used to indicate items which are not currently required, maybe in a later phase.

@Don-Isdale Don-Isdale self-assigned this May 7, 2021
Don-Isdale added a commit that referenced this issue May 8, 2021
first part of : #239  : FASTA / DNA sequence search API, display results in frontend , in feature/backendBlast.

feature.js :  add Feature.dnaSequenceSearch(), dev_blastResult (dev test data).

child-process.js : added, with childProcess() factored from dataset.js

identity.js : add to comment.

dnaSequenceSearch.bash : added (first 37 lines copied from uploadSpreadsheet.bash); just the outer frame, with some sample results to echo back; inner processing will be added as blast is installed.

access.js : genericResolver() : add dnaSequenceSearch.

sequence-search.js : added, with : classNames, actions: { inputIsActive, paste , } , dnaSequenceInputBound, dnaSequenceInput(),
(inputIsActive(), paste() based on panel/feature-list.js, dnaSequenceInput based on featureNameList, classNames from panel/manage-base.js)

blast-results.js : added (based on data-csv.js)

auth.js : added dnaSequenceSearch().

controls.js : add apiServerSelectedOrPrimary() (factored from goto-feature-list.js : blocksUnique), used by dnaSequenceInput()

left-panel.hbs : add tab : Sequence Search.

sequence-search.hbs : added (framework based on left-panel.hbs)
blast-results.hbs : added (parts based on data-csv.hbs)
backend/ :
  92a3300  4403 May  8 15:58  common/models/feature.js
A 03638a8  5959 May  8 15:58  common/utilities/child-process.js
  e3fa7a3  1064 May  7 17:38  common/utilities/identity.js
A 37e9617  1233 May  8 15:43  scripts/dnaSequenceSearch.bash
  5538e71  4158 May  7 16:14  server/boot/access.js
frontend/app/ :
A 84d75da  2022 May  8 19:01  components/panel/sequence-search.js
A 0487485  3878 May  8 19:35  components/panel/upload/blast-results.js
  abefab3 21394 May  7 20:05  services/auth.js
  153a88a  1597 May  7 13:48  services/controls.js (edited comment)
  29f5d29  4170 May  7 15:26  templates/components/panel/left-panel.hbs
A 72967ab  1845 May  8 19:25  templates/components/panel/sequence-search.hbs
A 5acd228   324 May  8 19:25  templates/components/panel/upload/blast-results.hbs
Don-Isdale added a commit that referenced this issue May 10, 2021
part of #239.
feature.js : dnaSequenceSearch() : add params resultRows, addDataset, and pass to childProcess() : [parent, searchType, resultRows, addDataset].

child-process.js : childProcess() : add param moreParams - array of extra params, pass them on command line of child process after fileName, useFile.
child.on(close ) : if result code 0, check .length of errors and warnings.

dnaSequenceSearch.bash :
added params parent, searchType, resultRows, addDataset.  (resultRows is used in this commit).
convert dev_blastResult from env var to function so that newline and tab can be output.
factor to form datasetIdDir.
Report (via stdio[3] Error:) errors detected at each step : cd, dbName, blastn.

sequence-search.js :
add resultRows, addDataset.
add, copied from data-base.js : isProcessing, successMessage, errorMessage, warningMessage, progressMsg, setError().
add, copied from data-csv.js : newDatasetName, nameWarning, isDupName, onNameChange, onSelectChange.
add datasetsToSearch().
paste() : use event.target.value as text
promise reject : use setError(), copied from data-base.js.

blast-results.js : table config options : drop minRows: 20

dataset.js : add hasTag(), based on block.js

auth.js : dnaSequenceSearch() :  add params resultRows, addDataset and pass in data.

app.scss : adjust left margin of sequence-search : ul.config-list.

left-panel.hbs : pass datasets to sequence-search.

sequence-search.hbs :
add, based on data-csv.hbs, parent select, dataset_new, panel-message, nameWarning, isProcessing.
add inputs : resultRows,  addDataset.
@Don-Isdale
Copy link
Collaborator Author

Don-Isdale commented May 28, 2021

multiple output tabs

when user clicks sequence search:

  • a new tab is created for that request, and initially says "Searching..."

  • then when the result comes, displaying it in a table, with the button below for adding as a dataset

  • and a close tab button which clears it all (forgets it)

  • show each result in a new output tab, when received [2-4H/7H] f442afc

    • show tab id : 24-hour hh:mm:ss, eg: 22:44:01 with the seconds : "Blast Result (22:44:01)" d9fdac1
    • wrap search.seq with auto scroll-bars. 019591b
  • refresh datasets (getting task cancellation in getBlocksLimits() ) b345f82

  • button in output tab to add dataset, using json upload API. [2-8H/8H] 9640e0f

    • add end position to feature. [1-3H] (679858a)
      the backend change enables this to be also done in update-csv, added a comment in commit msg describing the change
    • [1-2H/4H] Possibly add .values { } (e62a521)
    • add a close button to blast-results tab [1H] 696c304
    • 💤 currently the result data is passed to upload for add dataset; we could use the data from the table, allowing the user to delete rows from the table before adding the dataset
  • [/1H] handle the case of no result : in that case need to indicate that the search has completed.
    display "Searching..." and if result returned, show table, or else "No hits found". (b9851a7)

    • 💤 empty result might cause issues when creating a dataset from it
      Not a current issue; this depends on options=searchAddDataset, which is not enabled.
  • display table rows as Feature triangles [4-8H/7H] 27e3a5d, 3601354, d527097
    This achieves the same goal as earlier item : after viewing the added dataset : also put them into the feature search so they are highlighted

    • causes exception when attempt to view other data blocks [1-2H/1H]
      This exception was only seen once, and has not been reproduced - may have been a testing artefact.
    • show feature name as label [2-8H/2H] c0077b4
    • when multiple searches are done, it shows triangles for the tab that is selected [2 - 8H/5H] cd92e0f
  • 💤 as user cursor moves through results table, highlight the corresponding feature in the graph, similar to the feature table highlight of axis red/yellow circles.

@Don-Isdale
Copy link
Collaborator Author

Don-Isdale commented Jun 3, 2021

Filtering the blast-results table :

Motive : can get hits all over the place for some genes, but only 2-3 actual good matches

    • filter on various columns to produce a filtered table, and see the result in real time [4-8H]
      This may be done later. Instead use a checkbox column :
  • filter rows with a checkbox column, and update the display of feature triangles
    [5.5H] ae5be35, a6ff007, 874192b

    • [2-4H/2H] implement a show / hide - all checkbox (the checkbox is added in 874192b, but the effect is not yet implemented) (199875a)
    • [1-4H/1H] enable sort on the columns which the user is likely to filter on, in particular % identity and length of HSP (hit) (683e3fc)
  • 💤 use the modified table for view (and add dataset - can be later) [4-8H]


filter column
query ID
subject ID
x % identity
length of HSP (hit)
x # mismatches
x # gaps
query start
query end
subject start
subject end
x e-value
x score
query length
subject length

  • 💤 on mouseover of the triangle, showing the hit details in top right.. [3-6H]

  • 💤 option : colour triangle with the features value relative to the currently selected filter [2-4H]

@Don-Isdale
Copy link
Collaborator Author

Don-Isdale commented Jun 7, 2021

GUI refinements / adjustments

  • [2H/1H] flag out 'add dataset' and 'view (dataset)' (78a3670)

  • [2H/0.5H] row limit : default -> 500 (b7ad900)

  • [4H/1H] limit fasta input : <= 1 sequence, total sequence length < 2kb (086ea39)

    • [4H/2.5H] fasta with no header; show error.
      Check input : require exactly 1 marker line and sequence text, and no other input. (086ea39)
  • [2H/0.5H] fasta input textarea : increase height to ~30 rows (6219efc)

  • [/0.5H] sequence search : increase searchStringMaxLength -> 10k 603485e

  • [2-6H/1.5H] filtering up-front (ie: before clicking search) : user can specify ID and coverage cutoff
    which would translate to a filter on the output of blast before it comes from BE.
    Motivation : searching a 10kb gene sequence, the vast majority of the blast output is irrelevant
    if you restrict to hits >90% id and >70% coverage say, it'll be just a few hits that are meaningful (2c48b1a, 522d6ef)

    • [2H/0.5H] could be sliders on the sequence search tab above "search" button for example;
      useful sliders would be length of hit (default any), %id (default 75%), coverage % (default 50%) (2c48b1a, 522d6ef)
  • [1-3H/1H] delay drawing the table until a result is received (298c77e)

  • [1-2H/1H] require minimum of 25 chars in search string. (82d9f61)

    • [1-2H/2H] seems that no result is returned when search for "A" - check this (40e77e0)
      • [1-3H/1.5H] check handling : when child process returns empty file, return status 0 (OK), but no reply sent to client, which times out (a8493d4)
  • [1-3H/6H] preserve tick box state when switching blast-results-view table between modal dialog and left panel
    Test case e.g. expand the table, and deselect some lines, then minimise : it resets the tick boxes.
    When it switches between dialog and left-panel, a new table component is created.

Either move ownership of the view flags from blast-results-view to blast-results or .search, so that switching between modal and left-panel does not refresh them,
or move the table across instead of destroy&recreate (keep the left-panel target element enabled).
related : https://handsontable.com/docs/9.0.0/PersistentState.html

(71f1c13, d1bf486] :blast-results-view : set table height when changing tableModal )

  • [1-2H/3.5H] User may un-view some axes, then wish to see all again : add button : View all axes with results (28bec64)

  • [1H] search panel order : feature search, blast search, external lookup.
    feature and blast can be open by default (3bdf49e)

  • 💤 [2-4H] ability to put in an NCBI ID, get that sequence, and then search with that.
    interface to NCBI. similar lookup to e.g. https://www.biostars.org/p/52652/#52654

  • [1H] move the warning message field (e.g. about not selecting a reference) to appear between the search button and the drop down list

    • [/0.5H] sequence-search : move panel-message above Search button 0aced5c
  • 💤
    One thing we might extend : the sequence-search may show multiple warnings, and they would be best on separate lines; could change panel-message to do that if .warningMessage is an array of messages. [2-3H]

  • [1-2H/3H] use upload-table in data-csv, so that it gets the addition of end position
    related to the above "add .values { }" (e62a521)
    upload-table is factored out from data-csv, which can be changed to use this library; that would enable extra columns to be added in data-csv
    (93659b2)

implemented for add dataset in backend; for transient add in frontend : deferred to later branch :
    • [1-2H] and additional .values columns.

From meeting, Jun15 :

  • [/1H] when change from results tab to seq input tab and back again, remember the selected (view flag) features of that results tab (c89982c)

  • [/0.5H] change text 'Show / Hide all Features (triangles)' to 'Select / Deselect all' (cfa1127)

  • show 'view all axes' checkbox ticked initially : intermittent - ok on local, not on dev (longer time?).
    Replaced by 79d650b

  • [/1H] only view all axes first time, not when re-visit tab (c89982c)

    • [/6H] subsequently may narrow axes to the selected features (perhaps a checkbox, default true) 79d650b
    • [1-2H] issues re. view checkbox , e.g. initial value, update
      • [/1H] handle request delay, which was causing view checkboxes to be unticked initially (6d9a4f6)
      • [1H/2.5H] some features selected, change tab and back, 'Select All' is unticked (good), toggle on - shows axis but not feature
        437783e: retain display of features when changing tab and to/from modal.
      • [1H/2.5H] changing to modal, deselect features, go back to left panel tab, the features are not shown
        437783e
        ec333c1 : avoid duplication of table elements when switching to modal.
  • [/0.5H] default range sliders to 0 (cfa1127)

  • [1H/0.5H] disable cell edit in table (0772e11)

  • [1-2H/8H] only first triangle moves with transition when axis zoom
    to get many hits, use slider:0, and e.g. first ~20 lines of https://www.ncbi.nlm.nih.gov/nuccore/DQ146423.1?report=fasta
    18196f2 : augment name with location in showLabels keyFn

    • [0.5H] brush zoom -> triangle is transitioning (late), label is now OK.
      same keyFn change as label
      69e526b : sequence-search result display : use value[0] in keyFn of showTickLocations() also.
      • result features : change transition on labels to match triangles 46298e9
      • [1H/1H] result features : label entry after transition f64085d
        • [1H/1.5H] similar for triangles
          4f62dff : feature search results : show entering triangles after axis has transitioned
          78c5087 : handle warning re. tableModal update, seen in testing
  • [/0.5H] don't show 'Show Table' toggle (cfa1127)


deferred for a later branch :
  • [1H/1H] clear button for search input, 839eb26

  • [1H] paste : join lines without > 7306095

    • [1H] the above commit handles 1 marker. in addition handle multiple markers (i.e. don't join the marker line),
      and place a limit on markers - can count them while joining the lines.
  • [2-4H] result streaming, because 1min timeout but would get lots of results before then.
    test with TREP database (Wicker) : Angela : Triticum

  • [1-2H] search input : under parent dropdown,
    dropdown : select 1 chromosome
    filter output (check if blast can accept col2 filter)

(probably) for next phase :

  • [2-6H] brush : view block, don't show in title
    This enables viewing and brushing the search results as tracks in split axis.

@Don-Isdale
Copy link
Collaborator Author

Don-Isdale commented Jun 30, 2021

To handle multiple parent / blastdb, pass dbName as a param in the curl request from blastn_request

This part applies to either of the following options :

    • blastServer.py : receive param dbName from form_data refn 13c2c28
      • [2H/2H] use debugger or print to confirm the param is in the form_data
      • [0-10H/4H] if not, then search for solutions / re-read the doc. This is the key risk in this task.
        Found that requirement was not -F, but put dbName in request-json.

A couple of design options :

  1. minimal change : blastn_request can send dbName and blastServer.py can receive it
    • [1-6H/3H] blastn_request curl : add -F "dbName=@$dbName"
      13c2c28. actually request-json, not -F.

(not required because 1. worked)
2. pretzel backend server: (bent) -> blast container with web api or blastServer.py

  • [2-4H] use bent library to send web request,
    • [0-4H] include dbName in the request. this is a documented use case, 0H if it works without any issues.
    • Another (near-future) option : this design may make it simpler to send blast requests to a secondary/remote server. ATM there isn't a clear advantage.

Software for a RESTful web api server wrapping blast was not found, either in a container or otherwise. So continue to use blastServer.py

    • Option for future extension : the web api would preferably support SSE so we can stream replies (requests are currently ~20sec so streaming is not essential; it would be nice to see the best results earlier, and be able to cancel the request once they were received).
      There is a Flask package which uses Redis to provide SSE, so this can be done if continuing with blastServer.py

Related options :
. augment pretzel container with blast (early experiments indicated a jump in image size)
. https://github.com/teammaclean/blastjs

@Don-Isdale
Copy link
Collaborator Author

Don-Isdale commented Jul 5, 2021

  • [1-4H/6H] issue : for subsequent axis, feature triangle may not display. (8f08e60)
  • [/1H] follow-on from above handling of multiple genomes : handle Chr also (case-insensitive) [4fe4fe0, 513c084]

@Don-Isdale
Copy link
Collaborator Author

  • [/2H] sequence search : change API timeout 1 to 2 min, support secondary server. 7540355 [in branch feature/qtlUpload]

@Don-Isdale
Copy link
Collaborator Author

  • [1H] button to copy selected Feature.values.Sequence into Sequence Search input

@Don-Isdale
Copy link
Collaborator Author

Don-Isdale commented Oct 24, 2022

Performance improvement

investigated the 2min time for blastn for a test case on pulses;

  • [2-4H] next steps to try are
    • [/2H] using SDD instead of HDD for blast partition,
      result : improved 2min -> 1min
    • maybe increase swapfile (8GB, whereas the Vfaba_hedin_v1 .fasta being searched is 12GB)
      blastn is only using ~400MB of available 8GB swapfile (now on sdd), so increasing swapfile size seems unlikely to improve performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: done
Development

No branches or pull requests

2 participants