source: stylistic-profile.json
source: dictionary-etymology.json
This flow is used to generate etymology dataset from dictionary etymology. After the etymology dataset is generated (with the namespace-name: dictionary.etymology.wiktionary.deep-partial
), it can be used across the project. At the moment it is used in the corpora flow: stylistic-profile
.
Execute the flow (dictionary-etymology
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/dictionary-etymology.json
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/dictionary-etymology.json
Each corpora-flow consists of dedicated sections for each plain text corpus it processes. It invokes tasks from the corpus-flow necessary for each task and tweaks any relevant parameters, such as how a particular document should be parsed and split into words and sentences.
Novels written in English:
Novels Nabokov English: corpus-en, sinister-en, lolita-en, pnin-en, harlequins-en, invite-en, gift-en, defense-en, knave-en, speak-en
Russian: corpus-ru, mary-ru, knave-ru, defense-ru, invite-ru, gift-ru, speak-ru, lolita-ru
- (1941) The Real Life of Sebastian Knight
- (1947) Bend Sinister
- (1955) Lolita, self-translated into Russian (1965)
- (1957) Pnin
- (1962) Pale Fire
- (1969) Ada or Ardor: A Family Chronicle
- (1972) Transparent Things
- (1974) Look at the Harlequins!
- Plus Speak, Memory (1951/1967)
- (1926) Mashen'ka (Машенька); English translation: Mary (1970)
- (1928) Korol' Dama Valet (Король, дама, валет); English translation: King, Queen, Knave (1968)
- (1930) Zashchita Luzhina (Защита Лужина); English translation: The Luzhin Defense or The Defense (1964) (also adapted to film, The Luzhin Defence, in 2000)
- (1930) Sogliadatai (Соглядатай (The Voyeur)), novella; first publication as a book 1938; English translation: The Eye (1965)
- (1932) Podvig (Подвиг (Deed)); English translation: Glory (1971)
- (1933) Kamera Obskura (Камера Обскура); English translations: Camera Obscura (1936), Laughter in the Dark (1938)
- (1934) Otchayanie (Отчаяние); English translation: Despair (1937, 1965)
- (1936) Priglasheniye na kazn' (Приглашение на казнь (Invitation to an execution)); English translation: Invitation to a Beheading (1959)
- (1938) Dar (Дар); English translation: The Gift (1963) Plus Lolita and Drugie berega.
- Brown Fiction
- Russian National Corpus (fiction)
POS, both unigrams and bigrams combinations of two (we can mention we support trigrams but not necessary to use them)
- As a special bonus, would be great to run etymology in Russian too - was it possible?
- 15 min intro in methods in stylometry, existing tools, their pros and cons
- 15 min intro into Bukvik and existing results
- 15 min – Accessing Bukvik on the participants’ computers
- Make experiments. We will pre-run each task beforehand to make sure everything is smooth.
- Simulate research process through experiments. Give quote from Grayson/Nabokov on differences for researchers. How do we test that?
- Aha, more nouns, interesting.
- Find examples of sentences with lots of nouns.
- Nope, but look at the translation effect, make a note.
- So, original texts in English have more nouns but not longer sentences.
- See NN NN NN. not done for dissertation
- Demonstrate POS combinations. not done for dissertation
- Ask if we should check other combinations, do a couple they suggest.
- Run, yep.
- Why richer? More foreign words, for sure. Knowing Nabokov, seems like we’re on the right path. Here, cite Chepiga, give an example from Ada.
- Show different languages.
- Create a diagram with average distribution by origin. not done for dissertation
- If possible, double the experiment for Russian. not done for dissertation
- Sum up (give reference for forthcoming publication).
- Is etymology then the reason for richer vocab? For more nouns? Well, it can contribute to varied.
- Nabokov has more nouns and richer vocab in his L2, and one factor that may help account for it is his preference for a distribution of words with particular kinds of origins, different from the normal distribution. This is one feature of deviation from the norm within standard language = style.
- Stylistic profile.
- Get samples of text, read in the light of what we learned – paying attention to nouns. Find a good passage for that.
- What could we look at next? Brainstorm. Potential projects. Potential development.
- Intro of other capacities of Bukvik, existing and in progress. Society of Words, semantic…
- Brainstorm on the future of such tools, and where Bukvik can/will develop (modular etc).
- Collaborations?
IMPORTANT ABOUT NAMING: corpora flows are renamed to avoid collisions of namespace and NamespaceName, for example: namespace: bukvik-workshop.data.lolita-ru and NsN: bukvik-workshop.data.lolita-ru:pos
(NOTE: BukvikDatasets: setDataset > BukvikNamespaceContainer: setEntity > _getContainer >
cdbd
cd ../..
mkdir datasets
cd datasets
git clone https://github.com/Cha-OS/bukvik-workshop-corpora
cdbp
Execute a corpora-flow:
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/nabokov-in-english.json
Execute a particular task (one text) of the corpora-flow:
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/nabokov-in-english.json -cmd execTask -t "<NAMESPACE_TASK>.defense-en"
(mprinc)
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/nabokov-in-english.json
Execute a task in corpora-flow:
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/nabokov-in-english.json -cmd execTask -t "<NAMESPACE_TASK>.corpus-en"
(mprinc)
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/nabokov-in-english.json -cmd execTask -t "<NAMESPACE_TASK>.corpus-en"
(mprinc)
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/nabokov-in-russian.json -cmd execTask -t "<NAMESPACE_TASK>.execute.speak-ru"
Execute the joint-flow:
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile-joined.json
(mprinc)
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile-joined.json
(mprinc)
```sh
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile-joined.json -cmd execTask -t "<NAMESPACE_TASK>.execute.english-stylistic-profile-pos-joining"
Execute whole flow: (server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json
Execute particular task (<NAMESPACE_TASK>.import.importing-the-corpus
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.execute.corpus-en"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.import.importing-the-corpus"
Execute particular task (<NAMESPACE_TASK>.parsers.words
):
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.parsers.words"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.parsers.words"
Execute particular task (<NAMESPACE_TASK>.pos.parsing-pos-external
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.parsing-pos-external"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.parsing-pos-external"
Execute particular task (<NAMESPACE_TASK>.pos.remapping-pos-tags
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.remapping-pos-tags"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.remapping-pos-tags"
Execute particular task (<NAMESPACE_TASK>.corpora.brown.words.distribution.generating-brown-list-of-words-distribution
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.corpora.brown.words.distribution.generating-brown-list-of-words-distribution"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.corpora.brown.words.distribution.generating-brown-list-of-words-distribution"
Execute particular task (<NAMESPACE_TASK>.stats.calculating-simple-stats
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.stats.calculating-simple-stats"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.stats.calculating-simple-stats"
Execute particular task (<NAMESPACE_TASK>.stats.calculating-etymology
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.stats.calculating-etymology"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.stats.calculating-etymology"
Execute particular task (<NAMESPACE_TASK>.distribution
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.distribution"
Execute particular task (<NAMESPACE_TASK>.distribution-out
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.distribution-out"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.distribution-out"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.distribution"
Execute particular task (<NAMESPACE_TASK>.pos.distribution
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.distribution"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.distribution"
Execute particular task (<NAMESPACE_TASK>.pos.distribution-out
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.distribution-out"
(mprinc):
python RunBukvik.py -env ../../../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.mprinc.json -exp ../../../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.distribution-out"
Execute particular task (<NAMESPACE_TASK>.pos.searching-pos
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.searching-pos"
Execute particular task (<NAMESPACE_TASK>.pos.exporting-pos-patterns
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.exporting-pos-patterns"
Execute particular task (<NAMESPACE_TASK>.pos.exporting-pos-document
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.pos.exporting-pos-document"
Execute particular task (<NAMESPACE_TASK>.dictionary.ner.characters.import.importing-ner-dictionary-file
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.dictionary.ner.characters.import.importing-ner-dictionary-file"
Execute particular task (<NAMESPACE_TASK>.dictionary.ner.characters.recognize.recognizing-ner
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.dictionary.ner.characters.recognize.recognizing-ner"
Execute particular task (<NAMESPACE_TASK>.distribution-out
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.distribution-out"
Execute particular task (<NAMESPACE_TASK>.words-society.wordssociety
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.words-society.wordssociety"
Execute particular task (<NAMESPACE_TASK>.words-society.wordssociety-out-graph
):
(server)
python RunBukvik.py -env ../experiments/projects/bukvik-workshop-project/environments/bukvik-workshop.env.server.json -exp ../experiments/projects/bukvik-workshop-project/flows/stylistic-profile.json -cmd execTask -t "<NAMESPACE_TASK>.words-society.wordssociety-out-graph"