Skip to content

Auto-generate an entire paper from a prompt or abstract using NLP

License

Notifications You must be signed in to change notification settings

ContextLab/abstract2paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to abstract2paper: the solution to your writer's block!

Author: Jeremy R. Manning

DOI

Step right up, step right up!

Writing papers got you down? Come on in, friend! Give my good ole' Abstract2papers Cure-All a quick try! Enter your abstract into the little doohicky here, and quicker'n you can blink your eyes1, a shiny new paper'll come right out for ya! What are you waiting for? Click the "doohicky" link above to get started, and then click the link to open the demo notebook in Google Colaboratory.

To run the demo as a Jupyter notebook (e.g., locally), use this version instead. Note: to compile a PDF of your auto-generated paper (when you run the demo locally), you'll need to have a working LaTeX installation on your machine (e.g., so that pdflatex is a recognized system command). The notebook will also automatically install the transformers library if it's not already available in your local environment.

In its unmodified state, the demo notebooks use the abstract from the GPT-3 paper as the "seed" for a new paper. Each time you run the notebook you'll get a new result, but an example PDF (generated using the smaller 1.3B parameter model) may be found here, and the associated .tex file may be found here.

How does it work, you ask?

Really it's quite simple. We put in a smidgen of this a pinch of that, plus a dab of our special secret ingredient, and poof! that's how the sausage is made.

No really, how does it work?

Ok, if you really want to know, all I'm doing here is using the Hugging Face implementation of GPT-Neo, which is itself a tweaked version of GPT-3 that is pre-trained on the Pile dataset.

The text you input is used as a prompt for GPT-Neo; to generate a document containing an additional n words, the model simply "predicts" the next n words that will come after the specified prompt.

With a little help from some basic LaTeX templates (borrowed from Overleaf), the document is formatted and compiled into a PDF.

Can I actually use this in real-world applications?

Doubtful. Or at least, probably not...? It certainly wouldn't be ethical to use this code to generate writing assignments, mass-produce papers or grant applications, etc. Further, you'll likely find that the text produced using this approach includes stuff that's said in funny (often nonsensical) ways, follows problematic logic, incorporates biases from the training data, and so on. Of lesser importance, but practical annoyance, you'll also encounter all sorts of formatting issues (although those might be easy to fix manually, and possibly even automatically with some clever tinkering).

⚠️ Disclaimers ⚠️

This demonstration is provided as is, and you choose to run it at your own risk. The GPT-Neo model is trained on a large collection of documents from a variety of sources across the Internet. Some of the text it's trained on includes potentially triggering and/or biased and/or horrible and/or disgusting-in-other-ways language. That means that the text the model produces in the demo may also include disturbing language. If you don't want to risk exposure to that sort of text, then you should not run the notebook.

Further, if you run the demo locally, it may mess up your compute environment, cause your stock portfolio to lose value, trigger the formation of a small-to-medium-sized black hole underneath your chair, put you in a bad mood, and/or otherwise mess up your day/week/month/year/life. Proceed with caution.

       

1This claim rests on the assumption that you blink really slowly. Depending on how much text you're trying to generate (and how long your prompt is), your paper could take anywhere from a few minutes to several hours to fully congeal.