-
-
Notifications
You must be signed in to change notification settings - Fork 390
Use StringDecoder for Buffers in WritableStream #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you add a test case that fails without this? |
"use strict";
var ParserStream = require('htmlparser2').WritableStream;
var Buffer = require('buffer').Buffer;
var assert = require('assert');
var parser = new ParserStream({
ontext:function(text){
assert.equal(text, '€');
}
});
parser.write(new Buffer([0xE2, 0x82]));
parser.write(new Buffer([0xAC])); Without the fix this should fail |
I get your point, but this problem is not inherent to text nodes. It could affect everything which contains characters that are not specified in ASCII (attributes, CDATA, comments, ...) new Buffer([0xE2, 0x82]).toString() + new Buffer([0xAC]).toString() !== '€' //results in '���' instead When working with Buffers in a streaming fashion you have to use StringDecoder to get utf8 right |
I think @ajafff is very right. I ran into trouble when parsing web pages, especially since the behaviour is quite unpredictable because a cut right between the two bytes happens quite rarely. I think it would be beneficial to add this information to the /wiki/Parser-options. Would have saved me some troubles at least. EDIT: |
I forgot about this, sorry. This needs a test case in the |
@fb55 changed single quotes to double quotes, added test |
Awesome, thanks! |
http://stackoverflow.com/questions/12121775/convert-streamed-buffers-to-utf8-string