Skip to content

macleginn/poetic-formula-extractor-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

poetic-formula-extractor-python

UPD: A new and much better version of this program is available nearby.

A script for extracting formulas from poetic texts.

The software backbone of this paper: https://www.academia.edu/6304149/_ (in Russian). Computes formulaic density of a poetic text, optionally prints the formulas from it to stdout or to a file or prints the original text with formulas bracketed to an html-file.

The scripts takes as its input a text of a poem, an alphabet, and a stop list (the latter two are iterables of any type and default to built-in Russian ones if not provided). UsefulData.py contains alphabets and stop lists for Russian, Homeric Greek, and Anglo-Saxon.

A console-usage example:

>>> import os
>>> os.chdir([The name of the working directory here.])
>>> import PoeticAnalysisNew as pan
>>> with open('Bylina.txt', 'r', encoding='utf-8') as inp:
        bylina = pan.Poem(inp.read())
>>> bylina.getFormulaicDensity()
31.5
>>> bylina.highlightFormulas('Bylina') # See the results in Bylina.html.
>>> from UsefulData import angloSaxonAlphabet
>>> from UsefulData import angloSaxonStopList
>>> with open('Beowulf.txt', 'r', encoding='utf-8') as inp:
        beowulf = pan.Poem(inp.read(), angloSaxonAlphabet, angloSaxonStopList)
>>> beowulf.getFormulaicDensity()
13.6
>>> beowulf.printFormulas()
feorh ealgian;
feorh ealgian
feorh ealgian,

beorhtode bencsweg;
beorsele benc

monig oft
Monig oft

eft cuman.
eft cuman,

ecean dryhtne,
ecean dryhtne,
ecean dryhtne;

wigena strengel,
wigena strengest,

...

A batch-analysis example:

from PoeticAnalysisNew import *
fileNames = []
os.chdir('texts')
directories = os.listdir()
for directory in directories:
    for root, dirs, files in os.walk(directory):
        for name in files:
            fileNames.append(os.path.join(root, name))
with open('../report.txt', 'w', encoding='utf-8') as out:
    for item in fileNames:
        if item.endswith('txt'):
            with open(item, 'r', encoding='utf-8') as inp:
                poem = Poem(inp.read())
            out.write(str(poem.getFormulaicDensity()) + '\n')

The algorithm is lousy and slow. All attempts at improving it only made the matters worse because new versions relied less heavily on C-based library routines. Eventually, I rewrote everything in Java. The new version is more than 4 times faster; I will create a repo for it presently.

About

A script for extracting formulas from poetic texts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages