-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Template too slow after 100 record. #513
Comments
Hi, @exproof. Could you please give me the code you benchmark with? |
if you need full code and template.docx i can send via e-mail. $Document->cloneRow('A01Sira', mysql_num_rows($Sorgu));
} |
I see that |
Approx 10 minute ;) This query result 175 record. What is your suggestion. Must i use every line not only first line? |
Wow. That's terrible. Please, give me your template file. Share it with any file sharing service like Dropbox, Google Drive, etc. and post link here. |
Code, Template and Result files 2015-03-30 7:23 GMT+03:00 Roman Syroeshko [email protected]:
|
i am newbie in github i reply your last post with attach files vie gmail. I hope you can take attach files. Thank for your support. |
Nope. No luck. Share it using your Google Drive. |
3 files in drive. Code, Template and Query Result |
OK, I got them. I will let you know results of investigation. Not sure that it happens today, because I feel asleep. :-) I would say that it happens this week. |
;) good sleep. Be relax. We have enough time. Thanks again for your support. iPhone'umdan gönderildi 30 Mar 2015 tarihinde 13:09 saatinde, Roman Syroeshko [email protected] şunları yazdı:
|
I faced the same problem before. My solution is using PHP function strpos() instead of preg_replace() in setValueForPart() in Template.php ( Notice that this file is deprecated 0.12.0. But you can find almost same content in TemplateProcessor.php). I guess the reason why PHPWord do not use strpos() is because, for example, "${tag}" won't be "${tag}" in openXML but something like "<w:r><w:t>${tag/w:t/w:r<w:bookmarkStart w:id="0" w:name="_GoBack"/><w:bookmarkEnd w:id="0"/><w:r><w:t>}/w:t/w:r". I don't understand why it should be like this. I know nothing about openXML, but my solution is "clean" the template to a new cleaned version, from something like "<w:r><w:t>${tag/w:t/w:r<w:bookmarkStart w:id="0" w:name="_GoBack"/> <w:bookmarkEnd w:id="0"/><w:r><w:t>}/w:t/w:r" to something like "${tag}". Then we are free to use strpos() to set values I guess. My code is here: https://github.com/Yongyiw/PHPWord/blob/master/src/PhpWord/Template.php |
Hi @exproof, <?php
require_once __DIR__ . '/src/PhpWord/Autoloader.php';
\PhpOffice\PhpWord\Autoloader::register();
$phpWord = new \PhpOffice\PhpWord\PhpWord();
$document = $phpWord->loadTemplate('template513.docx');
$startTime = \microtime(true);
$document->cloneRow('A01Sira', 175);
$endTime = \round(\microtime(true) - $startTime, 2);
$formatter = new NumberFormatter(null, NumberFormatter::DECIMAL, null);
$formatter->setAttribute(NumberFormatter::FRACTION_DIGITS, 2);
echo $endTime;
$document->saveAs('result513.docx'); It executes in 30 ms on my old Windows-based laptop. Output is available here. I will risk and make an assumption that the problem is in |
Hi, $document->cloneRow('A01Sira', 175); //your code $Document->setValue('A01Uretici#100', 'Test100'); you will see what is my problem about long process time. I add just 4 variables for change. Think this i should change approx ~1000 variables in a template ;) |
I got it. There are two issues here.
So, what we are going to do to get workaround.
This should help. |
Done. ;) You are great. I would like you to buy me a coffee on next russia trip. |
No problem. You are always welcome. 😃 |
I was facing the same issue. It would take over an hour to generate a table with 132 lines. Lost a couple of hours on this, and ended up replacing the first seven lines of protected function setValueForPart($documentPartXML, $search, $replace, $limit)
{
$callback = function ($match) {
return strip_tags($match[0]);
};
$documentPartXML = preg_replace_callback('~\$\{([<>][^\}]+)\}~', $callback, $documentPartXML);
// ... It's much, much faster. |
Did you try the version from |
I was using I changed foreach ($matches[0] as $value) {
$valueCleaned = strip_tags($value);
$fixedDocumentPart = str_replace($value, $valueCleaned, $fixedDocumentPart);
} ... and the speed improved from 6.1988 seconds to 1.7881 (with 132 lines). |
Done. Could you, please, retest? |
I can confirm that nicoSWD's fix for setValueForPart is much faster. |
Did you try the version from |
We're expericing issue with high volumes of data for cloning tables. Our data is setup as arrays and we're foreaching through the data to set the value. Using the development branch is much better than master, but still too slow. With 106 records it's taking about 15 seconds but with 1800 records it's timing out even after increase the set_time_limit to 900 seconds. If I alter the code in TemplateProcessor.php to the following, performance is improved, but I'm thinking there must be a reason why you're using preg_replace over str_replace. protected function setValueForPart($documentPartXML, $search, $replace, $limit)
{
/*if (substr($search, 0, 2) !== '${' && substr($search, -1) !== '}') {
$search = '${' . $search . '}';
}*/
/*if (!String::isUTF8($replace)) {
echo "here<br/><br/>";
$replace = utf8_encode($replace);
}*/
$regExpDelim = '/';
$escapedSearch = preg_quote($search, $regExpDelim);
return str_replace($search, $replace, $documentPartXML, $limit);
return preg_replace("{$regExpDelim}{$escapedSearch}{$regExpDelim}u", $replace, $documentPartXML, $limit);
} Had an additional thought, is it possible to pass in the $search and $replace values in as arrays rather than looping through them? So I would only be hitting setValue once rather have hundreds of times. |
Can you share your data sample and peace of your code? |
I can't directly share the code due to company policies etc... but I've tried mocked up a rough example below; Firstly I'm creating an array of tags with the key being the searchable tag and the value being the replacment value. This data contains standards tags, then in contains a line called "repeatLine" which is an array that contains the tags to be cloned, the count for number of clones and tags for each clone. To complicate matter further the each cloned row can have a sub row which is also an array done in the same way to allow me to clone a sub row of data. So realistically I'm getting template processor to process more than 1806 records because there are the standard tags, which is probably about 50 -75 definable tags. Then there are two row clones which each contain roughly 10 - 30 tags to be clone with the latter having a sub row clone which also contains 20 tags and a photo tag, which contains the pathway to an image to be included into the template. $tags = [
'${tagName}' => 'Tag Name',
'${tagDescription}' => 'Some description',
'repeatLine' => [
['cloneRow' => '${rowLine}',
'rowCount' => 1806,
'tags' => [
'${tag#1}' => 'Tag 1',
'${subRepeat#1}' => [
'cloneRow' => '${subRepeatLine#1}',
'rowCount' => 10,
'tags' => [
'${subRepatTag#1#1}' => 'Sub Table 1'
]
],
]
]
]
]; The tags eventually get passed through to a method which loops through the data and sets it appropriately. ** Please note I had to fork the repo to get photos to go into the template, as there wasn't a way to do this with the template processor as far as I know. $templateProcessor = new TemplateProcessor($modifiedFilename);
foreach ($data as $tag => $replacement) {
/** repeatLine indicates that we want to clone a row of data */
if ($tag === 'repeatLine') {
if (is_array($replacement)) {
foreach ($replacement as $rowData) {
/** Check that the repeatLine data is setup like expected */
if (isset($rowData['cloneData']) && isset($rowData['rowCount']) &&
isset($rowData['tags'])) {
$templateProcessor->cloneRow($rowData['cloneRow'], $rowData['rowCount']);
/** Loop through the row tags */
foreach ($rowData['tags'] as $rowTag => $rowReplacement) {
/** If the tag mentions photo then render out as a photo in the docx */
if (strpos(strtolower($rowTag), 'photo')) {
$exploded = array_reverse(explode('/', $rowReplacement));
$imgName = $exploded[0];
$tempImgFile = false;
/** If the image is not a PNG convert then recreate it as a PNG to be put in the
* docx. */
if (exif_imagetype($rowReplacement) != IMAGETYPE_JPEG) {
$imgName = explode('.', $imgName)[0].'.jpg';
$tempImgFile = sys_get_temp_dir().'/'.$imgName;
\Intervention\Image\Image::make($rowReplacement)->save($tempImgFile);
}
$templateProcessor->setImageValue($rowTag, $imgName, $rowReplacement);
/** Remove any temporary images */
if ($tempImgFile) {
unlink($tempImgFile);
}
} else {
/** If the data is an array then we're deal with another row which we want to
* clone. */
if (is_array($rowReplacement)) {
$templateProcessor->cloneRow(
$rowReplacement['cloneRow'],
$rowReplacement['rowCount']
);
/** Loop through the tags and output the data */
foreach ($rowReplacement['tags'] as $subRowTag => $subRowReplacement) {
$templateProcessor->setValue($subRowTag, $subRowReplacement);
}
} else {
$templateProcessor->setValue($rowTag, $rowReplacement);
}
}
}
}
}
}
} else {
$templateProcessor->setValue($tag, $replacement);
}
}
/** Save the template file to disk */
$templateProcessor->saveAs($modifiedFilename); I have managed to work my way around my issue in my forked repo, by changing setValueForPart to; protected function setValueForPart($documentPartXML, $search, $replace, $limit)
{
return str_replace($search, $replace, $documentPartXML, $limit);
} I have restructured the data to reduce the number of foreach loops down. So I now only loop to do the rowCloning and to do the images. The rest of data is passed in one call to setValue by passing it in as two arrays, array of tags and an array of replacement values. The template processor now process my very large data set including photos within 20 minutes. I hope that all makes sense, it is a complicated beast that I'm creating here with potential dynamic clone rows off a parent clone row. |
@OAFCROB, preg_replace function is used here, because it gives us ability of limiting number of replacements. str_replace function doesn't offer such functionality. Its 4th attribute has different meaning. It means "the number of replacements performed". This feature was introduced in #52. I'll try to create some patch for that. Thanks for the report! |
Sorry, for the delayed response. I understand why using preg_replace could be useful for limiting the replacement fields however, it can be incredibly slow in comparison to str_replace. Have you pushed a patch that I can try? |
Yes, we released v0.12.1. Now Thanks for the feedback! |
I've just looked at the source code and I could see a potential issue with str_replace as you can pass in the find and replacement fields as arrays. The calls to substr and String::isUTF8 expect a direct string, so is it worth doing a sanity check before calling them? I n my forked repo I took them out completly as I decided to pass in all my data as two arrays meaning I was only hitting this method once. Instead of foreach looping through the data calling setValue each time. if (substr($search, 0, 2) !== '${' && substr($search, -1) !== '}') {
$search = '${' . $search . '}';
}
if (!String::isUTF8($replace)) {
$replace = utf8_encode($replace);
}
// Note: we can't use the same function for both cases here, because of performance considerations.
if (self::MAXIMUM_REPLACEMENTS_DEFAULT === $limit) {
return str_replace($search, $replace, $documentPartXML);
} else {
$regExpDelim = '/';
$escapedSearch = preg_quote($search, $regExpDelim);
return preg_replace("{$regExpDelim}{$escapedSearch}{$regExpDelim}u", $replace, $documentPartXML, $limit);
}
|
Good idea, @OAFCROB. Will change the behavior in v0.13.0. |
First of all, thanks to the authors for this great project and sorry for my aproximate english. I think there is a design problem that came up when treating great volumes of data (by the way, treating hundreeds or thusand of lines can be common in many tasks). From my point of view the problem is that values add added after the rows have been cloned. The problem for me is that each time you add (clone) a row, the porcessing time grows two times:
the solution might be to do the search/replace while cloning, that way, the search / replace operations will be done only on the cloned part, not on the whole document. The processing time grows by n, not by n^2. I modified TemplateProcessor::cloneRow so it can accept an array with values a argument and do the replacement work while cloning.
Just call it by passing an array of arrays as third parameter:
and the replacement will be directly made while cloning. fourth parameter (limit) is totaly optional. if you call it the normal way (only two parameter), i will still beleive the normal way (just clone row, down't apply values) I tested it with the following code (using the template form the sample 7)
|
Hi,
Thanks for your product.
I have problem with template clonerow.
I have approx 150 record and template create document approx 500 second. is it normal?
The text was updated successfully, but these errors were encountered: