Skip to content
rmzi edited this page Oct 9, 2014 · 11 revisions

The Article Annotator is a simple tool that allows users to tag HTML documents to add to our Training Dataset.

The flow is as follows:

  1. Import HTML documents from server - AJAX call to MongoDB with crawled data, probably pull a few at once? - Needs: API endpoints to pull Article Models w/ HTML attached
  2. Render HTML w/ Annotator Tool overlay (Paintbrushes) - Render our frame with interface to cycle through articles, select different brushes and submit annotations. - Use JQuery to append HTML to our page after fetched
  3. Select a tool to annotate with (Title, Author, Date, Article Body, etc) - Cycle through brushes, each with its own color - Needs: Determine different types of brushes
  4. Highlight the html element underneath the cursor - This will be the most challenging part. We will need to do some tinkering with highlighting the object itself. Perhaps, we could add a transparent, colored overlay to the parent element of the text we've highlighted `(i.e.

    Lorem Ipsum

    becomes

Lorem Ipsum

) ` and/or change the text-decoration for the highlighted text (i.e. bold, change color) 5. Upon clicking, modify the DOM to include annotation
`i.e. clicking

My Article

with Title_Brush and add

My Article

` - Question: What's the best way to edit the DOM in place? Do we need a separate representation of the DOM? 6. Present user with list of meta tags and tag them. 7. Export annotated HTML to Training Data MongoDB - Once annotation is complete, we'll add the final DOM to the original article document and save it to the Training Data MongoDB.

List of Possible Annotations

  • Title
  • Subtitle
  • Section Title
  • Author
  • Date
  • Location
  • Image
  • Image Caption
  • Body
  • metadata(non-visible)

ToDo:

  • Setup MongoDB and simple node server to act as a gateway
  • Use schema from @skillachie to model documents in the MongoDB
  • Setup AnnotatorFrame w/ interface for fetching/cycling through articles, selecting brushes, and submitting results
  • Experiment with different highlighting methods
  • Ensure proper saving
  • Test everything

Technology:

  • Node.js Server
  • MongoDB
  • Mongoose (Node MongoDB driver)
  • JQuery
Clone this wiki locally