poetic-formula-extractor-python

UPD: A new and much better version of this program is available nearby.

A script for extracting formulas from poetic texts.

The software backbone of this paper: https://www.academia.edu/6304149/_ (in Russian). Computes formulaic density of a poetic text, optionally prints the formulas from it to stdout or to a file or prints the original text with formulas bracketed to an html-file.

The scripts takes as its input a text of a poem, an alphabet, and a stop list (the latter two are iterables of any type and default to built-in Russian ones if not provided). UsefulData.py contains alphabets and stop lists for Russian, Homeric Greek, and Anglo-Saxon.

A console-usage example:

>>> import os
>>> os.chdir([The name of the working directory here.])
>>> import PoeticAnalysisNew as pan
>>> with open('Bylina.txt', 'r', encoding='utf-8') as inp:
        bylina = pan.Poem(inp.read())
>>> bylina.getFormulaicDensity()
31.5
>>> bylina.highlightFormulas('Bylina') # See the results in Bylina.html.
>>> from UsefulData import angloSaxonAlphabet
>>> from UsefulData import angloSaxonStopList
>>> with open('Beowulf.txt', 'r', encoding='utf-8') as inp:
        beowulf = pan.Poem(inp.read(), angloSaxonAlphabet, angloSaxonStopList)
>>> beowulf.getFormulaicDensity()
13.6
>>> beowulf.printFormulas()
feorh ealgian;
feorh ealgian
feorh ealgian,

beorhtode bencsweg;
beorsele benc

monig oft
Monig oft

eft cuman.
eft cuman,

ecean dryhtne,
ecean dryhtne,
ecean dryhtne;

wigena strengel,
wigena strengest,

...

A batch-analysis example:

from PoeticAnalysisNew import *
fileNames = []
os.chdir('texts')
directories = os.listdir()
for directory in directories:
    for root, dirs, files in os.walk(directory):
        for name in files:
            fileNames.append(os.path.join(root, name))
with open('../report.txt', 'w', encoding='utf-8') as out:
    for item in fileNames:
        if item.endswith('txt'):
            with open(item, 'r', encoding='utf-8') as inp:
                poem = Poem(inp.read())
            out.write(str(poem.getFormulaicDensity()) + '\n')

The algorithm is lousy and slow. All attempts at improving it only made the matters worse because new versions relied less heavily on C-based library routines. Eventually, I rewrote everything in Java. The new version is more than 4 times faster; I will create a repo for it presently.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

poetic-formula-extractor-python

About

Releases

Packages

Languages

License

macleginn/poetic-formula-extractor-python

Folders and files

Latest commit

History

Repository files navigation

poetic-formula-extractor-python

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages