Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal error loading HTML data #2120

Open
emrenogay opened this issue Jul 27, 2021 · 2 comments
Open

Fatal error loading HTML data #2120

emrenogay opened this issue Jul 27, 2021 · 2 comments

Comments

@emrenogay
Copy link

emrenogay commented Jul 27, 2021

Fatal error loading HTML data

Fatal error: Uncaught BadMethodCallException: Cannot add TextRun in TextRun. in /Applications/MAMP/htdocs/bot/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php:266 Stack trace: #0 /Applications/MAMP/htdocs/bot/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php(131): PhpOffice\PhpWord\Element\AbstractContainer->checkValidity('TextRun') #1 [internal function]: PhpOffice\PhpWord\Element\AbstractContainer->addElement('TextRun', Array) #2 /Applications/MAMP/htdocs/bot/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php(113): call_user_func_array(Array, Array) #3 /Applications/MAMP/htdocs/bot/vendor/phpoffice/phpword/src/PhpWord/Shared/Html.php(265): PhpOffice\PhpWord\Element\AbstractContainer->__call('addtextrun', Array) #4 [internal function]: PhpOffice\PhpWord\Shared\Html::parseParagraph(Object(DOMElement), Object(PhpOffice\PhpWord\Element\TextRun), Array) #5 /Applications/MAMP/htdocs/bot/vendor/phpoffice/phpword/src/PhpWord/Shared/Html.php(215): call_user_func_array(Array, Arr in /Applications/MAMP/htdocs/bot/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php on line 266

<?php

error_reporting(E_ERROR);
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);

header('Content-Type: text/html; charset=utf-8');
include '../functions.php';
require('../vendor/autoload.php');

use PHPHtmlParser\Dom;
$phpWord = new \PhpOffice\PhpWord\PhpWord();
$phpWord->addParagraphStyle('Heading2', array('alignment' => 'center'));

\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(true);



$sitemap = $_POST['sitemap'];
$classes = $_POST['classes'];
$sitename = str_replace('.', '-', explode('/', $sitemap)[2]);

$sitemap_data = url_get_contents($sitemap);

$dom = new DOMDocument();
$dom->recover = true;
$dom->formatOutput = true;
$dom->preserveWhiteSpace = true;
$dom->strictErrorChecking = false;
$dom->loadHTML($sitemap_data);
$xpath = new DomXpath($dom);

$i = 0;
$data = '';
$expression = '//*[@class="' . $classes . '"]';
$parsed = '';

$section = $phpWord->addSection();
$html = '';
$h1 = '';
date_default_timezone_set("Europe/Istanbul");

$savename = $sitename . '-' . date('d-m-Y H-i-s') . '.docx';

$chars = ['&#8217;', '&ccedil;', '&#8220;', '&#8221;', "\t"];
$replaced = ["", 'ç', '', '', ''];

while ($i < 10) {
    $url = $xpath->query('//url/loc')->item($i)->nodeValue;

    if(!empty($url)){

        $dom = new Dom;
        $dom->loadFromUrl($url);
        $contents = $dom->find('*[class="'.$classes.'"]');

        foreach ($contents as $content) {

            $html .= trim(str_replace($chars, $replaced, preg_replace("/\s+/", " ", strip_tags($content->innerHtml, '<p><b><strong><ul><ol><li><h1><h2><h3><h4><h5><h6>'))));
        }

    }

    $i++;
}

echo $html;
\PhpOffice\PhpWord\Shared\Html::addHtml($section, $html);
$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007');

$objWriter->save($savename);

Expected Behavior

A clear and concise description of what you expected to happen.

Current Behavior

What is the current behavior?

Context

Please fill in your environment information:

  • PHP Version: 7.4
  • PHPWord Version: 0.18.2
@iIronside
Copy link

Has this bug not been fixed yet?

@thomasb88
Copy link

Look like you ma a pre process to handle HTML to XHTML Conversion, but it seems strange to me as addHtml do
$dom->loadXML($html);

Which manage, for example &ccdecil; using html_entity_decode

Whatever, the problem should be on HTML tree, in which it seems there is a container in a container. Can you provide one example of HTML content ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants