GitHub - com3dian/Grobidmonkey: The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.

The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.

Website: https://github.com/com3dian/Grobidmonkey
Documentation: https://github.com/com3dian/Grobidmonkey/tree/master/Document
Source code: https://github.com/com3dian/Grobidmonkey/tree/master/src/grobidmonkey
Bug reports: https://github.com/com3dian/Grobidmonkey/issues
Citing in your work: https://studenttheses.uu.nl/handle/20.500.12932/45939 or

@mastersthesis{lu2024unsupervised,
  title={Unsupervised Paper2Slides Generation},
  author={Lu, Zehao},
  year={2024}
}

grobidmonkey is a light weight python package built to handle TEI XML files generated by GROBID. It provides a reader class that converts these files into Python dictionaries, making them simple to read and work with. The grobidmonkey reader is capable of reading the entire essay as a dictionary, where each key represents section titles and the corresponding values are lists of section contents in paragraphs. Also the reader provides a method for reading the outline of essay as a tree.

Installation

Currently grobidmonkey is only available in PyPI, and can be installed with

pip install grobidmonkey

Quick Start

from grobidmonkey import reader
monkeyReader = reader.MonkeyReader('monkey') # or 'lxml' or 'x2d'

# read paper outline
outline = monkeyReader.readOutline('/path/to/your/paper.pdf.tei.xml')

# read paper content
essay = monkeyReader.readEssay('/path/to/your/paper.pdf.tei.xml')

For detailed explanantion and tutorial, please check the Document page.

Contirbution

We welcome all contributions, whether they involve code, documentation, or testing, feel free to reach out to me via email at [email protected].

Icon

Gorbidmonkey's icon is a walking monkey.

                  $$                                                                   
           $$$$$$$$$$$$$$$$$$                              $$$$$$
       $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$                    $$$$$$$$$$                       
    $$$$$$$$                $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$             
   $$$$$$                          $$$$$$$$$$$$$$$$$$$$$$$$$$$           
  $$$$$$                                   $$$$$$$$$$$$                             
 $$$$$$                                                              
 $$$$$$                                                                   
 $$$$$$                                                                             
 $$$$$$                                                                    GROBIDMONKEY
 $$$$$$$                           $$$$$$$$$$$$$$$                     $$$$$$$$$$$$$$$$$$$$
 $$$$$$$                   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$          $$$$$$$$$$$$$$$$$$$$$$$
  $$$$$$$$          $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$    $$$$$   $$$$$$$$     $$
    $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  $$$$$$      $$  $
      $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ $$$$$$$$          $$
          $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$     $$$$
              $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
               $$$$$$$$$$$$$$$$$$ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$    $$$$$$$$$$$$$$$
                 $$$$$$$$$$$$$$$$    $$$$$$$$$$$$$  $$$$$$$$$$$$$$$$$$$         $$$$$$$
                  $$$$$$$$$$$$$$$       $$$$$$$$$  $$$$$$$$$$$$$$$$$  
            $$$$$  $$$$$$$$$$$$$$                 $$$$$$$$$$$$$$$$   $$$$
        $$$$$$$$$$$  $$$$$$$$$$$$               $$$$$$$$$$$$$$$$  $$$$$$$
    $$$$$$$$$$$$$$$$ $$$$$$$$$$$$             $$$$$$$$$$$$$$   $$$$$$$$$$
 $$$$$$$$$$$$$$$$$  $$$$$$$$$$$             $$$$$$$$$$$$$    $$$$$$$$$$$
$$$$$$$$$$$        $$$$$$$$$$              $$$$$$$$$$        $$$$$$$$$$$
$$$$$$$            $$$$$$$$$               $$$$$$$$            $$$$$$$$$$
 $$$$$            $$$$$$$$$                $$$$$$$              $$$$$$$$$
 $$$$$$          $$$$$$$$                 $$$$$$$$                $$$$$$$$$
 $$$$$$          $$$$$$$$                $$$$$$$$                 $$$$$$$$$
 $$$$$$          $$$$$$$$              $$$$$$$$$                    $$$$$$$$
 $$$$$           $$$$$$$$$            $$$$$$$$$                       $$$$$$$$
                  $$$$$$$$$$        $$$$$$$$$                           $$$$$$$$
                    $$$$$$$$$$$$$$$$$$$$$$                                $$$$$$$$$$$$$$$$$$$
                         $$$$$$$$$$$                                           $$$$$$$$$$$$$$

About GROBID

GROBID means GeneRation Of BIbliographic Data.

GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications.

You can also try the GROBID web app with your paper.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
Document		Document
src/grobidmonkey		src/grobidmonkey
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Quick Start

Contirbution

Icon

About GROBID

About

Releases 1

Packages

Languages

License

com3dian/Grobidmonkey

Folders and files

Latest commit

History

Repository files navigation

Installation

Quick Start

Contirbution

Icon

About GROBID

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages