Skip to content
This repository has been archived by the owner on Dec 11, 2020. It is now read-only.

Add an advanced text generator based on markov chains. #254

Merged
merged 8 commits into from
Mar 3, 2014
Merged

Add an advanced text generator based on markov chains. #254

merged 8 commits into from
Mar 3, 2014

Conversation

TimWolla
Copy link
Contributor

The advanced text generator's text()-generator is compatible to
Lorem's one and can be used as a drop in replacement.

see #109

Examples (The number is the maximum amount of characters, the line breaks are here for better formatting and not generated):

  10: Poor.

  20: The Duchess tucked her.

  50: Mary Ann, what are you getting on?" said the Hatter, and he.

  70: This time there could be no mistake about it—it was neither more nor l
      ess than a.

  90: She drew herself up and said very gravely, "I think I must be off, and s
      he dropped it hastily, just in time to.

 120: Very soon the Rabbit came up to her great delight, it fitted! Alice open
      ed the door and tried to make out which were the cook took the least not
      ice.

 150: I should like to be talking in its sleep, "that 'I breathe when I breath
      e!'" "It is the same height as herself. She stretched herself up on tipt
      oe and peeped over the list. Imagine.

 200: She stretched herself up on tiptoe and peeped over the list. Imagine her
       surprise when he finds out who I was when I get it home?" when it saw h
      er. "Cheshire-Puss," began Alice, rather timidly, "would you please tell
       me your history, you.

 500: However, when they liked and left off quarreling with the others. IX—W
      HO STOLE THE TARTS? The King and Queen of Hearts, she made some tarts, A
      ll on a summer day; The Knave of Hearts, carrying the King's crown on a 
      bough of a book," thought Alice, "without pictures or conversations in i
      t, "and what is the same thing," said the Cat, and vanished. Alice had f
      ound her head pressing against the ceiling, and had no reason to be seen
      —everything seemed to quiver all over with William the Conqueror." So 
      she was now the right paw 'round, "lives a March Hare. Alice gave a wear
      y sigh. "I think you ought to.

The advanced text generator's text()-generator is compatible to
Lorem's one and can be used as a drop in replacement.

namespace Faker\Provider;

class Text extends \Faker\Provider\Base
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this provider contains English text, it should probably be under the en_US locale. So I think you should commit one class with no locale and a basic text (see for instance the base Person generator), and another one, more complete with the en_US locale.

The non-locale Text provider can even be abstract, to force a proper text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll change it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it, putting lorem ipsum text in this class brings confusion with the text class. That's why you should leave the $baseText empty and make the class abstract, to be instanciated in locales.

@fzaninotto
Copy link
Owner

I absolutely love that contribution. I think it adds great value to Faker. Thanks a lot!

@Anahkiasen
Copy link

I've been wanting something like that for a while now so I really appreciate this too.

@TimWolla
Copy link
Contributor Author

@fzaninotto The updates are now added to this pull request.

* @example 'Lorem ipsum dolor sit amet'
* @param integer $maxNbChars Maximum number of characters the text should contain
* @param integer $indexSize Determines how many words / chars are considered for the generation of the next token (higher number = correcter, lower number = more random)
* @param string $indexUnit Determines whether 'words' or 'chars' represent the basis of the generator.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify, I suggest to remove the ability to generate character-based markov text. It only produces legible text with 4+ characters, and by that time it's almost equivalent to a word-based markov chain.

@TimWolla
Copy link
Contributor Author

@fzaninotto Everything should be fine now.

throw new \InvalidArgumentException('indexSize must be at most 10');
}

if (!isset(static::$tables[$indexSize])) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see a potential bug when switching locales.

$faker = Faker\Factory::create('fr_FR');
$faker->realText(100); // generates static $table cache for French locale
$faker = Faker\Factory::create('en_EN');
$faker->realText(100); // uses static $table cache for French locale, generates French text

That probably means that $tables should not be static, and therefore realText shouldn't be static either.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, $tables is a pretty generic name. I suggest $consecutiveWords.

@TimWolla
Copy link
Contributor Author

TimWolla commented Mar 3, 2014

@fzaninotto Done.

@fzaninotto
Copy link
Owner

Awesome. I'll merge it right away. Thanks a lot for your patch!

@fzaninotto
Copy link
Owner

oops, sorry can't merge: tests fail.

@TimWolla
Copy link
Contributor Author

TimWolla commented Mar 3, 2014

oops, sorry can't merge: tests fail.

Funny, they did work on my development box though they shouldn't. Anyway: Fix gone out just now.

Even more funny: Hiphop worked.

fzaninotto added a commit that referenced this pull request Mar 3, 2014
Add an advanced text generator based on markov chains.
@fzaninotto fzaninotto merged commit 680b36d into fzaninotto:master Mar 3, 2014
@TimWolla TimWolla deleted the advancedTextProvider branch March 3, 2014 16:00
@staabm
Copy link

staabm commented Jun 5, 2014

❤️

@tablatronix
Copy link

wonderful, Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants