-
Notifications
You must be signed in to change notification settings - Fork 9
Roadmap
Gabriel Vîjială edited this page May 10, 2017
·
25 revisions
- Management command that re-walks a collection to find files that are new / modified / deleted and adds them to the digest queue (snoop)
- Create pill-like filters that can be manipulated in the search bar like here
- Make a new search every time the filters change
- Select / De-select all collections
- Filter by subfolder
- Filter by email by sender/ receiver
- Filter by filetype [boxes to tick for one or multiple filetypes before searching]
- Filter by date or date range [date of file creation, date of email sent, date of file modification]
- Filter by language
- Filter for money related terms: IBAN, sum of money (USD, CHF, EUR etc) (use regex searches)
- Filter for web address and related terms: telephone, website, email other contact info (use regex searches)
- Data migration to make
Document.sha1
unique and store all fields in another table namedDocumentInstance
described here - OR: Group results by hash (maybe use elasticsearch field collapsing?) (ui)
- Display if a document is duplicated (ui)
- Page to list all occurences of duplicates for a document (ui)
- Style the document preview (see mockups) (ui)
- Render document tree (ui) mockups: in search page, in document preview
- implement for batch search (ui)
- pie chart by filetype
- pie chart by language
- date histogram
- Separate scroll boxes for search results and document preview
- Highlight the document that's currently being previewed
- Click on the search icon to perform a search
- For images, the document preview should load the image, if it's not too large
- Make it easy to copy text from document preview
- Embed hypothesis javascript snippet in document preview (ui)
- Tag all emails in a thread with the same ID (https://cr.yp.to/immhf/thread.html) (snoop)
- Return the most recent result from an email thread, show the number of messages in the thread (ui)
- Provide a way to see other messages in the same thread (ui)
- If multiple documents share the same hash, present a menu with all of them
- All permalink versions should contain
<link rel=canonical>
pointing to the document id permalink
- Email address (ui)
- Names of people and companies (ui)
- Bank accounts, phone numbers (ui)
- In the collections sidebar menu, show number of documents for each
- Scan indexed files for viruses (ClamAV?)
- Generalize access to dataset repositories
- Read files from HTTP server in addition to local filesystem (nginx/apache directory listing? WebDAV?) (snoop)
- System metrics (load, cpu, swap, disk free, memory of various services) - use code from github.com/python-diamond/Diamond/tree/master/src/collectors
- Collection access permissions - map groups instead of users (search)
- 5secunde
- Romanian gazette (MOFs)
- Luxembourg gazette
- US Embassy Cables
- Enron dataset
- OpenCorporates
- EU tenders from TED
- Offshore Leaks
- Travis
- Auto deploy demo server when github master is updated
- Installation package modelled after homebrew
- Embedding solution to help with publishing
- Remember which documents were visited/previewed
- Detect entities (names, IBANs, emails, phone numbers, websites, authors of documents/PDFs, etc); normalize and index them separately or use custom tokenizer; make it easy to search for them
- Have an aproximate/similar results feature, suggestions (like Google)
- Compare up to 3 documents in the same screen
- Download emails as PDF
- Use list of terms from external source (e.g. gist file)
- Venn diagram to see overlap between sets of search results
- Highlight entities, add them to clipboard with one click
- Use clipboard to make batch searches
- Name, description
- Who owns and hosts the collection (collections can be indexed from a remote server hosted by someone else)
- How many documents; breakdown by filetype, language, document dates (a few buckets of values, for example: "2013, 2014, 2015" or "May, June, July, August 2015" or "26 to 29, July 2015"), source (e.g. if the collection contains news articles from multiple publishers)
- How to download the whole collection or import it into your own hoover
Setfiletype
for image and video files (snoop)Serve json data for a document (snoop)Servedoc.html
from ui so that documents are rendered by the front-end app (ui, search)Make it easy to work on UI code using the demo server as backendAllow text selection in search resultsAfter choosing number of results from dropdown, auto-perform new searchSort results by relevance, newest, oldestCacheapp.js
(use webpack-generated hash and cache forever?)Loading indicator for search results and document previewShow document's word count in search resultsNew fieldparent_id
, links to parent archive/email or top-level folder in the collectionMetrics for jobs (fields: queue name, data, start time, duration, success)Descriptive search errors in UI (e.g. elasticsearch is down, query syntax error)