Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML namespacing #24

Closed
paprikka opened this issue Apr 19, 2013 · 5 comments
Closed

XML namespacing #24

paprikka opened this issue Apr 19, 2013 · 5 comments

Comments

@paprikka
Copy link

Hi,

I've noticed another small, but nasty bug:

Every cell with xml attributes gets ignored because of regular expressions. Try to compare your current results with this quick fix:

// file: xlsx.js :143
function parseStrs(data) { 

    // XXX: improve, temporary fix
    data = data.replace(/ xml:space="preserve"/gi, '');

    var s = [];
    var sst = data.match(new RegExp("<sst ([^>]*)>([\\s\\S]*)<\/sst>","m"));
    if(sst) {
        s = sst[2].replace(/<si>/g,"").split(/<\/si>/).map(function(x) { var z = {};
            var y=x.match(/<(.*)>([\s\S]*)<\/.*/); if(y) z[y[1]]=unescapexml(y[2]); return z;});

        sst = parsexmltag(sst[1]); s.count = sst.count; s.uniqueCount = sst.uniqueCount;
    }
    if(debug) s.rawdata = data;
    return s;
}

I had to fix that quickly because the bug had some pretty serious influence on the app I was presenting today. Sorry for pointing out an error with no comprehensive solution, but I have no time to fix that in a more appropriate way now and I thought you'll want to know that before. I'll be able to post a better solution along with a pull request in the next week.

By the way, great job. The library still needs some improvements but compared to others - is pretty darn fast :)

@redchair123
Copy link

@paprikka every bug you find improves this a little bit :) Thanks for reporting. I will push a fix later today (If you could, please share a buggy file (obviously stripping out confidential information) by email (look at the commit log) or some file sharing service.

@redchair123
Copy link

@paprikka I wonder if you are using the latest version. The version from March 21 2013 (commit hash 7ed5d70, commit message "xml preserve space regex") appeared to fix the issue. Can you send me that file?

7ed5d70#xlsx.js

 // matches <foo>...</foo> extracts content
-function matchtag(f,g) {return new RegExp('<' + f + '>([\\s\\S]*)</' + f + '>',g||"");}
+function matchtag(f,g) {return new RegExp('<'+f+'(?: xml:space="preserve")?>([^]*)</'+f+'>',(g||"")+"m");}

@paprikka
Copy link
Author

@Niggler you're right, I've made some major changes to your library a month ago and haven't updated it since that time. Everything works fine now. Sorry for the confusion.

@redchair123
Copy link

@paprikka If you've made some changes that improve parsing or fix other bugs, might I suggest submitting some of them for inclusion?

@paprikka
Copy link
Author

@Niggler Yes of course! I'll do that with pleasure when I come back from the holidays (in 8 days from today). I'm coming back in a week but I have to test this stuff before (that's why I didn't send any pull requests yet).

shenquan815 pushed a commit to shenquan815/SSSS that referenced this issue Mar 24, 2020
Addresses issue from LO Calc files generating invalid formats.

Link: SheetJS/sheetjs#24
KingTiger001 added a commit to KingTiger001/sheet-project that referenced this issue Dec 18, 2022
Addresses issue from LO Calc files generating invalid formats.

Link: SheetJS/sheetjs#24
@SheetJS SheetJS locked and limited conversation to collaborators Feb 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants