Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tried to load kanjidic2.xml, no errors, no data either #140

Closed
Pomax opened this issue Jun 7, 2014 · 3 comments
Closed

tried to load kanjidic2.xml, no errors, no data either #140

Pomax opened this issue Jun 7, 2014 · 3 comments
Assignees

Comments

@Pomax
Copy link

Pomax commented Jun 7, 2014

as per the README I tried to run this:

var fs = require("fs");
var parser = require('xml2js');
fs.readFile('kanjidic2.xml', function(err, data) {
    parser.parseString(data, function (err, result) {
        console.dir(result);
        console.log('Done');
    });
});

on this: ftp://ftp.monash.edu.au/pub/nihongo/kanjidic2.xml.gz

The result was

undefined
Done

that doesn't seem right.

@Leonidas-from-XIV Leonidas-from-XIV self-assigned this Jun 8, 2014
@jcsahnwaldt
Copy link
Contributor

jcsahnwaldt commented Jun 9, 2018

@Pomax You should check err before you access result. In this case, console.log(err) probably would have printed this:

Error: Text data outside of root node.
Line: 327
Column: 1
Char: ]
    at error (/Users/jcsahnwaldt/git/digitalHub/node_modules/sax/lib/sax.js:651:10)
    at strictFail (/Users/jcsahnwaldt/git/digitalHub/node_modules/sax/lib/sax.js:677:7)
    at SAXParser.write (/Users/jcsahnwaldt/git/digitalHub/node_modules/sax/lib/sax.js:1035:15)
    at Parser.exports.Parser.Parser.parseString (/Users/jcsahnwaldt/git/digitalHub/node_modules/xml2js/lib/parser.js:322:31)
    at Parser.parseString (/Users/jcsahnwaldt/git/digitalHub/node_modules/xml2js/lib/parser.js:5:59)
    at Object.<anonymous> (/Users/jcsahnwaldt/git/digitalHub/foo.js:34:8)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)

@jcsahnwaldt
Copy link
Contributor

This is a bug in sax-js. See isaacs/sax-js#236

Your XML contains a DTD with comments that contain closing square brackets. For some reason, sax-js gets confused by these closing square brackets.

When I removed these closing square brackets from the comments, I got a different error:

Error: Max buffer length exceeded: doctype
Line: 535159
Column: 0
Char: 
    at error (.../sax.js:651:10)
    at checkBufferLength (.../sax.js:125:13)
    at SAXParser.write (.../sax.js:1505:7)
    ...

When I removed all the comments from the DTD (about 280 comment lines between <!DOCTYPE kanjidic2 [ and ]>), xml2js could parse the file and produced this result:

{ kanjidic2: 
   { header: 
      [ { file_version: [ '4' ],
          database_version: [ '2018-160' ],
          date_of_creation: [ '2018-06-09' ] } ],
     character: 
      [ { literal: [ '亜' ],
          codepoint: 
           [ { cp_value: 
                [ { _: '4e9c', '$': { cp_type: 'ucs' } },
                  { _: '16-01', '$': { cp_type: 'jis208' } } ] } ],
                  ...

(It goes on like this for thousands of lines...)

@Pomax
Copy link
Author

Pomax commented Jun 11, 2018

probably worth mentioning in the README.md in a gotcha section or the like. I've not needed this for four years now, but maybe someone else has run into this, since.

@Pomax Pomax closed this as completed Jun 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants