-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add an advanced text generator based on markov chains. #254
Conversation
The advanced text generator's text()-generator is compatible to Lorem's one and can be used as a drop in replacement.
|
||
namespace Faker\Provider; | ||
|
||
class Text extends \Faker\Provider\Base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this provider contains English text, it should probably be under the en_US
locale. So I think you should commit one class with no locale and a basic text (see for instance the base Person generator), and another one, more complete with the en_US
locale.
The non-locale Text provider can even be abstract, to force a proper text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about it, putting lorem ipsum text in this class brings confusion with the text class. That's why you should leave the $baseText
empty and make the class abstract, to be instanciated in locales.
I absolutely love that contribution. I think it adds great value to Faker. Thanks a lot! |
I've been wanting something like that for a while now so I really appreciate this too. |
@fzaninotto The updates are now added to this pull request. |
* @example 'Lorem ipsum dolor sit amet' | ||
* @param integer $maxNbChars Maximum number of characters the text should contain | ||
* @param integer $indexSize Determines how many words / chars are considered for the generation of the next token (higher number = correcter, lower number = more random) | ||
* @param string $indexUnit Determines whether 'words' or 'chars' represent the basis of the generator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify, I suggest to remove the ability to generate character-based markov text. It only produces legible text with 4+ characters, and by that time it's almost equivalent to a word-based markov chain.
@fzaninotto Everything should be fine now. |
throw new \InvalidArgumentException('indexSize must be at most 10'); | ||
} | ||
|
||
if (!isset(static::$tables[$indexSize])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see a potential bug when switching locales.
$faker = Faker\Factory::create('fr_FR');
$faker->realText(100); // generates static $table cache for French locale
$faker = Faker\Factory::create('en_EN');
$faker->realText(100); // uses static $table cache for French locale, generates French text
That probably means that $tables
should not be static, and therefore realText
shouldn't be static either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, $tables
is a pretty generic name. I suggest $consecutiveWords
.
@fzaninotto Done. |
Awesome. I'll merge it right away. Thanks a lot for your patch! |
oops, sorry can't merge: tests fail. |
Funny, they did work on my development box though they shouldn't. Anyway: Fix gone out just now. Even more funny: Hiphop worked. |
Add an advanced text generator based on markov chains.
❤️ |
wonderful, Thanks! |
The advanced text generator's text()-generator is compatible to
Lorem's one and can be used as a drop in replacement.
see #109
Examples (The number is the maximum amount of characters, the line breaks are here for better formatting and not generated):