-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode::ONLY_PRAGMA_WARNINGS in $PerlIO::encoding::fallback #16059
Comments
From @paliHi! I would continue in discussion started in last year in p5p mailing list: Currently there is big mess with reporting warnings from Moreover mess is bigger, because Encode is used by PerlIO::encoding In perl you can enable some utf8 warnings by pragma warning. Next for This lead to couple of bug reports like Encode::encode() or decode() https://rt.cpan.org/Public/Bug/Display.html?id=120505 As stated in previous discussion I'm proposing new behaviour: * Introduce new Encode check flag Encode::ONLY_PRAGMA_WARNINGS which * Add Encode::ONLY_PRAGMA_WARNINGS by default to :encoding layer As this change affects both Perl & externally maintained Encode module In attachments are patches implementing above proposed behaviour. Encode |
From @pali0001-Encode-Add-new-check-flag-Encode-ONLY_PRAGMA_WARNING.patchFrom 62698e1c8c4f70082b0311fdb3cc0c881f6650c7 Mon Sep 17 00:00:00 2001
From: Pali <[email protected]>
Date: Sat, 1 Jul 2017 12:32:34 +0200
Subject: [PATCH 1/2] Encode: Add new check flag Encode::ONLY_PRAGMA_WARNINGS
When this new flag is set then only warnings configured and enabled by
pragma warnings are reported. It has no effect without setting check flag
Encode::ENCODE_WARN_ON_ERR.
---
cpan/Encode/Encode.pm | 131 +++++++++++-----------------------------
cpan/Encode/Encode.xs | 13 ++--
cpan/Encode/Encode/encode.h | 3 +
cpan/Encode/Unicode/Unicode.xs | 113 ++++++++++++++++++++++------------
cpan/Encode/t/decode.t | 9 ++-
cpan/Encode/t/enc_eucjp.t | 4 ++
cpan/Encode/t/utf32warnings.t | 114 ++++++++++++++++++++++++++++++++++
cpan/Encode/t/utf8messages.t | 33 ----------
cpan/Encode/t/utf8warnings.t | 109 ++++++++-------------------------
9 files changed, 273 insertions(+), 256 deletions(-)
create mode 100644 cpan/Encode/t/utf32warnings.t
delete mode 100644 cpan/Encode/t/utf8messages.t
diff --git a/cpan/Encode/Encode.pm b/cpan/Encode/Encode.pm
index 255f594..a99f70b 100644
--- a/cpan/Encode/Encode.pm
+++ b/cpan/Encode/Encode.pm
@@ -14,6 +14,7 @@ BEGIN {
use Exporter 5.57 'import';
+use Carp ();
our @CARP_NOT = qw(Encode::Encoder);
# Public, encouraged API is exported by default
@@ -162,109 +163,54 @@ sub clone_encoding($) {
}
sub encode($$;$) {
- my ( $name, $string, $check ) = @_;
- return undef unless defined $string;
- $string .= ''; # stringify;
- $check ||= 0;
- unless ( defined $name ) {
- require Carp;
- Carp::croak("Encoding name should not be undef");
- }
+ my $name = $_[0];
+ my $check = $_[2];
+ Carp::croak("Encoding name should not be undef") unless defined $name;
my $enc = find_encoding($name);
- unless ( defined $enc ) {
- require Carp;
- Carp::croak("Unknown encoding '$name'");
- }
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $octets;
- if ( ref($enc) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $octets = $enc->encode( $string, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $octets = $enc->encode( $string, $check );
+ Carp::croak("Unknown encoding '$name'") unless defined $enc;
+ my $encode = $enc->can('encode');
+ Carp::croak("No function 'encode' for encoding '$name'") unless defined $encode;
+ $check ||= 0;
+ splice(@_, 0, 1, $enc);
+ if (ref $check or !$check or ($check & LEAVE_SRC)) {
+ my $string = $_[1];
+ splice(@_, 1, 1, $string);
}
- $_[1] = $string if $check and !ref $check and !( $check & LEAVE_SRC );
- return $octets;
+ splice(@_, 2, 1, $check);
+ goto &$encode;
}
*str2bytes = \&encode;
sub decode($$;$) {
- my ( $name, $octets, $check ) = @_;
- return undef unless defined $octets;
- $octets .= '';
- $check ||= 0;
+ my $name = $_[0];
+ my $check = $_[2];
+ Carp::croak("Encoding name should not be undef") unless defined $name;
my $enc = find_encoding($name);
- unless ( defined $enc ) {
- require Carp;
- Carp::croak("Unknown encoding '$name'");
- }
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $string;
- if ( ref($enc) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $string = $enc->decode( $octets, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $string = $enc->decode( $octets, $check );
+ Carp::croak("Unknown encoding '$name'") unless defined $enc;
+ my $decode = $enc->can('decode');
+ Carp::croak("No function 'decode' for encoding '$name'") unless defined $decode;
+ $check ||= 0;
+ splice(@_, 0, 1, $enc);
+ if (ref $check or !$check or ($check & LEAVE_SRC)) {
+ my $octets = $_[1];
+ splice(@_, 1, 1, $octets);
}
- $_[1] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
- return $string;
+ splice(@_, 2, 1, $check);
+ goto &$decode;
}
*bytes2str = \&decode;
sub from_to($$$;$) {
my ( $string, $from, $to, $check ) = @_;
- return undef unless defined $string;
- $check ||= 0;
+ Carp::croak("Encoding name should not be undef") unless defined $from and defined $to;
my $f = find_encoding($from);
- unless ( defined $f ) {
- require Carp;
- Carp::croak("Unknown encoding '$from'");
- }
+ Carp::croak("Unknown encoding '$from'") unless defined $f;
my $t = find_encoding($to);
- unless ( defined $t ) {
- require Carp;
- Carp::croak("Unknown encoding '$to'");
- }
-
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $uni;
- if ( ref($f) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $uni = $f->decode($string);
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $uni = $f->decode($string);
- }
-
- if ( ref($t) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $_[0] = $string = $t->encode( $uni, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $_[0] = $string = $t->encode( $uni, $check );
- }
-
+ Carp::croak("Unknown encoding '$to'") unless defined $t;
+ return undef unless defined $string;
+ $check ||= 0;
+ my $uni = $f->decode($string);
+ $_[0] = $string = $t->encode( $uni, $check );
return undef if ( $check && length($uni) );
return defined( $_[0] ) ? length($string) : undef;
}
@@ -279,14 +225,9 @@ sub encode_utf8($) {
my $utf8enc;
sub decode_utf8($;$) {
- my ( $octets, $check ) = @_;
- return undef unless defined $octets;
- $octets .= '';
- $check ||= 0;
$utf8enc ||= find_encoding('utf8');
- my $string = $utf8enc->decode( $octets, $check );
- $_[0] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
- return $string;
+ unshift(@_, $utf8enc);
+ goto &{$utf8enc->can('decode')};
}
onBOOT;
diff --git a/cpan/Encode/Encode.xs b/cpan/Encode/Encode.xs
index 6c077be..c1222a1 100644
--- a/cpan/Encode/Encode.xs
+++ b/cpan/Encode/Encode.xs
@@ -250,7 +250,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
(UV)ch, enc->name[0]);
return &PL_sv_undef; /* never reaches but be safe */
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
Perl_warner(aTHX_ packWARN(WARN_UTF8),
ERR_ENCODE_NOMAP, (UV)ch, enc->name[0]);
}
@@ -289,7 +289,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
enc->name[0], (UV)s[slen]);
return &PL_sv_undef; /* never reaches but be safe */
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
Perl_warner(
aTHX_ packWARN(WARN_UTF8),
ERR_DECODE_NOMAP,
@@ -446,7 +446,11 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
char esc[UTF8_MAXLEN * 6 + 1];
STRLEN i;
- if (SvROK(check_sv)) {
+ if (!SvOK(check_sv)) {
+ fallback_cb = &PL_sv_undef;
+ check = 0;
+ }
+ else if (SvROK(check_sv)) {
/* croak("UTF-8 decoder doesn't support callback CHECK"); */
fallback_cb = check_sv;
check = ENCODE_PERLQQ|ENCODE_LEAVE_SRC; /* same as perlqq */
@@ -520,7 +524,7 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
else
Perl_croak(aTHX_ ERR_DECODE_STR_NOMAP, (strict ? "UTF-8" : "utf8"), esc);
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
if (encode)
Perl_warner(aTHX_ packWARN(WARN_UTF8),
ERR_ENCODE_NOMAP, uv, (strict ? "UTF-8" : "utf8"));
@@ -1064,6 +1068,7 @@ BOOT:
newCONSTSUB(stash, "WARN_ON_ERR", newSViv(ENCODE_WARN_ON_ERR));
newCONSTSUB(stash, "RETURN_ON_ERR", newSViv(ENCODE_RETURN_ON_ERR));
newCONSTSUB(stash, "LEAVE_SRC", newSViv(ENCODE_LEAVE_SRC));
+ newCONSTSUB(stash, "ONLY_PRAGMA_WARNINGS", newSViv(ENCODE_ONLY_PRAGMA_WARNINGS));
newCONSTSUB(stash, "PERLQQ", newSViv(ENCODE_PERLQQ));
newCONSTSUB(stash, "HTMLCREF", newSViv(ENCODE_HTMLCREF));
newCONSTSUB(stash, "XMLCREF", newSViv(ENCODE_XMLCREF));
diff --git a/cpan/Encode/Encode/encode.h b/cpan/Encode/Encode/encode.h
index 5fbcf76..9d56483 100644
--- a/cpan/Encode/Encode/encode.h
+++ b/cpan/Encode/Encode/encode.h
@@ -94,6 +94,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
#define ENCODE_WARN_ON_ERR 0x0002 /* warn on error; may proceed */
#define ENCODE_RETURN_ON_ERR 0x0004 /* immediately returns on NOREP */
#define ENCODE_LEAVE_SRC 0x0008 /* $src updated unless set */
+#define ENCODE_ONLY_PRAGMA_WARNINGS 0x0010 /* when enabled report only warnings configured by pragma warnings, otherwise report all warnings; no effect without ENCODE_WARN_ON_ERR */
#define ENCODE_PERLQQ 0x0100 /* perlqq fallback string */
#define ENCODE_HTMLCREF 0x0200 /* HTML character ref. fb mode */
#define ENCODE_XMLCREF 0x0400 /* XML character ref. fb mode */
@@ -107,4 +108,6 @@ extern void Encode_DefineEncoding(encode_t *enc);
#define ENCODE_FB_HTMLCREF (ENCODE_HTMLCREF|ENCODE_LEAVE_SRC)
#define ENCODE_FB_XMLCREF (ENCODE_XMLCREF|ENCODE_LEAVE_SRC)
+#define encode_ckWARN(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || ckWARN(w)))
+
#endif /* ENCODE_H */
diff --git a/cpan/Encode/Unicode/Unicode.xs b/cpan/Encode/Unicode/Unicode.xs
index b3b1d2f..500d065 100644
--- a/cpan/Encode/Unicode/Unicode.xs
+++ b/cpan/Encode/Unicode/Unicode.xs
@@ -20,11 +20,15 @@
/* For pre-5.14 source compatibility */
#ifndef UNICODE_WARN_ILLEGAL_INTERCHANGE
# define UNICODE_WARN_ILLEGAL_INTERCHANGE 0
-# define UTF8_DISALLOW_SURROGATE 0
+# define UNICODE_WARN_FE_FF 0
+# define UNICODE_WARN_NONCHAR 0
+# define UNICODE_WARN_SUPER 0
+#endif
+#ifndef UTF8_DISALLOW_ILLEGAL_INTERCHANGE
+# define UTF8_DISALLOW_ILLEGAL_INTERCHANGE 0
# define UTF8_WARN_SURROGATE 0
-# define UTF8_DISALLOW_FE_FF 0
-# define UTF8_WARN_FE_FF 0
# define UTF8_WARN_NONCHAR 0
+# define UTF8_WARN_SUPER 0
#endif
#define PERLIO_BUFSIZ 1024 /* XXX value comes from PerlIOEncode_get_base */
@@ -233,53 +237,69 @@ CODE:
ucs2 = SvTRUE(sv);
}
if (ucs2 || size == 4) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":no surrogates allowed %" UVxf,
*hv_fetch((HV *)SvRV(obj),"Name",4,0),
ord);
}
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":no surrogates allowed %" UVxf,
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+ ord);
+ }
ord = FBCHAR;
}
else {
UV lo;
if (!isHiSurrogate(ord)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Malformed HI surrogate %" UVxf,
*hv_fetch((HV *)SvRV(obj),"Name",4,0),
ord);
}
- else {
- ord = FBCHAR;
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed HI surrogate %" UVxf,
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+ ord);
}
+ ord = FBCHAR;
}
else if (s+size > e) {
- if (check) {
- if (check & ENCODE_STOP_AT_PARTIAL) {
- s -= size;
- break;
- }
- else {
- croak("%" SVf ":Malformed HI surrogate %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
- }
+ if (check & ENCODE_STOP_AT_PARTIAL) {
+ s -= size;
+ break;
}
- else {
- ord = FBCHAR;
+ if (check & ENCODE_DIE_ON_ERR) {
+ croak("%" SVf ":Malformed HI surrogate %" UVxf,
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+ ord);
+ }
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed HI surrogate %" UVxf,
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+ ord);
}
+ ord = FBCHAR;
}
else {
lo = enc_unpack(aTHX_ &s,e,size,endian);
if (!isLoSurrogate(lo)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Malformed LO surrogate %" UVxf,
*hv_fetch((HV *)SvRV(obj),"Name",4,0),
ord);
}
- else {
- s -= size;
- ord = FBCHAR;
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed LO surrogate %" UVxf,
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+ ord);
}
+ s -= size;
+ ord = FBCHAR;
}
else {
ord = 0x10000 + ((ord - 0xD800) << 10) + (lo - 0xDC00);
@@ -289,13 +309,18 @@ CODE:
}
if ((ord & 0xFFFE) == 0xFFFE || (ord >= 0xFDD0 && ord <= 0xFDEF)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Unicode character %" UVxf " is illegal",
*hv_fetch((HV *)SvRV(obj),"Name",4,0),
ord);
- } else {
- ord = FBCHAR;
}
+ if (encode_ckWARN(check, WARN_NONCHAR)) {
+ warner(packWARN(WARN_NONCHAR),
+ "%" SVf ":Unicode character %" UVxf " is illegal",
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),
+ ord);
+ }
+ ord = FBCHAR;
}
if (resultbuflen < SvCUR(result) + UTF8_MAXLEN + 1) {
@@ -314,15 +339,22 @@ CODE:
resultbuf = (U8 *) SvGROW(result, newlen);
resultbuflen = SvLEN(result);
}
+ U32 flags = 0;
+ if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UNICODE_WARN_SUPER;
+ if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UNICODE_WARN_SURROGATE;
+ if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UNICODE_WARN_NONCHAR;
+ d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), ord, flags);
+
+ if (!d) {
+ d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), FBCHAR, 0);
+ }
- d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), ord,
- UNICODE_WARN_ILLEGAL_INTERCHANGE);
SvCUR_set(result, d - (U8 *)SvPVX(result));
}
if (s < e) {
/* unlikely to happen because it's fixed-length -- dankogai */
- if (check & ENCODE_WARN_ON_ERR) {
+ if (encode_ckWARN(check, WARN_UTF8)) {
Perl_warner(aTHX_ packWARN(WARN_UTF8),"%" SVf ":Partial character",
*hv_fetch((HV *)SvRV(obj),"Name",4,0));
}
@@ -407,23 +439,29 @@ CODE:
}
while (s < e && s+UTF8SKIP(s) <= e) {
STRLEN len;
- UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
- |UTF8_WARN_SURROGATE
- |UTF8_DISALLOW_FE_FF
- |UTF8_WARN_FE_FF
- |UTF8_WARN_NONCHAR));
- s += len;
- if (size != 4 && invalid_ucs2(ord)) {
+ U32 flags = UTF8_DISALLOW_ILLEGAL_INTERCHANGE;
+ if (encode_ckWARN(check, WARN_NON_UNICODE)) flags |= UTF8_WARN_SUPER;
+ if (encode_ckWARN(check, WARN_SURROGATE)) flags |= UTF8_WARN_SURROGATE;
+ if (encode_ckWARN(check, WARN_NONCHAR)) flags |= UTF8_WARN_NONCHAR;
+ UV ord = utf8n_to_uvuni(s, e-s, &len, flags);
+ if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {
if (!issurrogate(ord)) {
if (ucs2 == -1) {
SV *sv = attr("ucs2", 4);
ucs2 = SvTRUE(sv);
}
if (ucs2 || ord > 0x10FFFF) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":code point \"\\x{%" UVxf "}\" too high",
*hv_fetch((HV *)SvRV(obj),"Name",4,0),ord);
}
+ if (encode_ckWARN(check, WARN_NON_UNICODE)) {
+ warner(packWARN(WARN_NON_UNICODE),
+ "%" SVf ":code point \"\\x{%" UVxf "}\" too high",
+ *hv_fetch((HV *)SvRV(obj),"Name",4,0),ord);
+ }
+ enc_pack(aTHX_ result,size,endian,FBCHAR);
+ } else if (ord == 0) {
enc_pack(aTHX_ result,size,endian,FBCHAR);
} else {
UV hi = ((ord - 0x10000) >> 10) + 0xD800;
@@ -440,6 +478,7 @@ CODE:
else {
enc_pack(aTHX_ result,size,endian,ord);
}
+ s += len;
}
if (s < e) {
/* UTF-8 partial char happens often on PerlIO.
diff --git a/cpan/Encode/t/decode.t b/cpan/Encode/t/decode.t
index 3995412..ce4caab 100644
--- a/cpan/Encode/t/decode.t
+++ b/cpan/Encode/t/decode.t
@@ -49,9 +49,12 @@ $orig = "\xC3\x80";
$orig =~ /(..)/;
is(Encode::decode_utf8($1), "\N{U+C0}", 'passing magic regex to Encode::decode_utf8');
-$orig = "\xC3\x80";
-*a = $orig;
-is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+SKIP: {
+ skip "Bug in Perl: https://rt.perl.org/Public/Bug/Display.html?id=131263", 1;
+ $orig = "\xC3\x80";
+ *a = $orig;
+ is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+}
$orig = "\N{U+C0}";
$orig =~ /(.)/;
diff --git a/cpan/Encode/t/enc_eucjp.t b/cpan/Encode/t/enc_eucjp.t
index fc0af3c..8f933b0 100644
--- a/cpan/Encode/t/enc_eucjp.t
+++ b/cpan/Encode/t/enc_eucjp.t
@@ -25,6 +25,10 @@ BEGIN {
}
}
+use Encode qw();
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+use warnings "utf8";
+
no warnings "deprecated";
use encoding 'euc-jp';
diff --git a/cpan/Encode/t/utf32warnings.t b/cpan/Encode/t/utf32warnings.t
new file mode 100644
index 0000000..cf3ab67
--- /dev/null
+++ b/cpan/Encode/t/utf32warnings.t
@@ -0,0 +1,114 @@
+use strict;
+use warnings;
+BEGIN {
+ if ($] < 5.014){
+ print "1..0 # Skip: Perl 5.14.0 or later required\n";
+ exit 0;
+ }
+}
+
+my $script = quotemeta $0;
+
+use Encode;
+use Test::More tests => 15;
+
+my $valid = "\x61\x00\x00\x00";
+my $invalid = "\x78\x56\x34\x12";
+
+our $warn;
+$SIG{__WARN__} = sub { $warn = $_[0] };
+
+my $enc = find_encoding("UTF32-LE");
+
+{
+ local $warn;
+ my $ret = $enc->decode( $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ ok(!defined $warn, "Calling decode on UTF32-LE encode object with valid string produces no warnings");
+ is($ret, "a", "Calling decode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ my $ret = $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/is not Unicode.* at $script line /, "Calling decode on UTF32-LE encode object with valid string warns");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ my $ret = $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ ok(!defined $warn, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ my $ret = $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ ok(!defined $warn, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ my $ret = Encode::decode( $enc, $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ ok(!defined $warn, "Calling Encode::decode for UTF32-LE with valid string produces no warnings");
+ is($ret, "a", "Calling Encode::decode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ my $ret = Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/is not Unicode.* at $script line /, "Calling Encode::decode for UTF32-LE with valid string warns");
+}
+
+
+{
+ local $warn;
+ no warnings 'utf8';
+ my $ret = Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ ok(!defined $warn, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ my $ret = Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ ok(!defined $warn, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings");
+}
+
+
+use PerlIO::encoding;
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+
+{
+ local $warn;
+ open my $fd, '<:encoding(UTF32-LE)', \$valid or die;
+ my $str = <$fd>;
+ ok(!defined $warn, "Calling PerlIO :encoding on valid string produces no warnings");
+ is($str, "a", "PerlIO decodes string correctly");
+}
+
+
+{
+ local $warn;
+ open my $fd, '<:encoding(UTF32-LE)', \$invalid or die;
+ my $str = <$fd>;
+ like($warn, qr/is not Unicode.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ open my $fd, '<:encoding(UTF32-LE)', \$invalid or die;
+ my $str = <$fd>;
+ ok(!defined $warn, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ open my $fd, '<:encoding(UTF32-LE)', \$invalid or die;
+ my $str = <$fd>;
+ ok(!defined $warn, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
diff --git a/cpan/Encode/t/utf8messages.t b/cpan/Encode/t/utf8messages.t
deleted file mode 100644
index 8b6b379..0000000
--- a/cpan/Encode/t/utf8messages.t
+++ /dev/null
@@ -1,33 +0,0 @@
-use strict;
-use warnings;
-BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-
-use Test::More;
-use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-
-plan tests => 12;
-
-my @invalid;
-
-ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
-like $@, qr/^"\\x\{d800\}" does not map to UTF-8 /, 'Error message contains strict UTF-8 name';
-@invalid = ();
-encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-
-ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
-like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
-decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
diff --git a/cpan/Encode/t/utf8warnings.t b/cpan/Encode/t/utf8warnings.t
index 0d1ac6d..e4e4304 100644
--- a/cpan/Encode/t/utf8warnings.t
+++ b/cpan/Encode/t/utf8warnings.t
@@ -1,94 +1,35 @@
use strict;
use warnings;
-BEGIN {
- if ($] < 5.014){
- print "1..0 # Skip: Perl 5.14.0 or later required\n";
- exit 0;
- }
-}
+BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-use Encode;
-use Test::More tests => 10;
+use Test::More;
+use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-my $valid = "\x61\x00\x00\x00";
-my $invalid = "\x78\x56\x34\x12";
+my $script = quotemeta $0;
-my @warnings;
-$SIG{__WARN__} = sub {push @warnings, "@_"};
+plan tests => 12;
-my $enc = find_encoding("UTF32-LE");
+my @invalid;
-{
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $valid );
- is("@warnings", "", "Calling decode in Encode::Unicode on valid string produces no warnings");
-}
+ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
+like $@, qr/^"\\x\{d800\}" does not map to UTF-8 at $script line /, 'Error message contains strict UTF-8 name';
+@invalid = ();
+encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
+ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
+ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-{
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- like("@warnings", qr/is not Unicode/, "Calling decode in Encode::Unicode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings 'utf8'");
-}
-
-{
- no warnings;
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings");
-}
-
-
-
-{
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- like("@warnings", qr/is not Unicode/, "Calling decode in Encode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
- no warnings;
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings");
-};
-
-
-
-{
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- like("@warnings", qr/is not Unicode/, "Calling from_to in Encode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
- no warnings;
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings");
-};
+ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
+like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
+decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
--
1.7.9.5
|
From @pali0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patchFrom e4db61d9ea0a5f5a58f07f4b5214b879f20e4cc2 Mon Sep 17 00:00:00 2001
From: Pali <[email protected]>
Date: Sat, 1 Jul 2017 12:34:25 +0200
Subject: [PATCH 2/2] PerlIO::encoding: Use Encode::ONLY_PRAGMA_WARNINGS in
fallback by default
This would enable to respect utf8 warnings enabled/disabled by pramga
warnings when processing filehandle with :encoding layer.
---
ext/PerlIO-encoding/encoding.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ext/PerlIO-encoding/encoding.pm b/ext/PerlIO-encoding/encoding.pm
index 08d2df4..9e59a3c 100644
--- a/ext/PerlIO-encoding/encoding.pm
+++ b/ext/PerlIO-encoding/encoding.pm
@@ -14,7 +14,7 @@ require XSLoader;
XSLoader::load();
our $fallback =
- Encode::PERLQQ()|Encode::WARN_ON_ERR()|Encode::STOP_AT_PARTIAL();
+ Encode::PERLQQ()|Encode::WARN_ON_ERR()|Encode::ONLY_PRAGMA_WARNINGS()|Encode::STOP_AT_PARTIAL();
1;
__END__
--
1.7.9.5
|
From @cpansproutOn Sat, 01 Jul 2017 04:46:15 -0700, pali@cpan.org wrote:
It think this is a great improvement. Thank you for all your work. -- Father Chrysostomos |
The RT System itself - Status changed from 'new' to 'open' |
From @paliOn Saturday 01 July 2017 06:08:08 Father Chrysostomos via RT wrote:
Hi! Any comments for this new proposed behavior or implementation in |
From @paliOn Saturday 01 July 2017 06:08:08 Father Chrysostomos via RT wrote:
@Father Chrysostomos: Now patches and there fore more then month. I would like to hear your opinion... or if it is OK without any problem |
From @LeontOn Sat, Jul 1, 2017 at 1:46 PM, via RT <perlbug-followup@perl.org> wrote:
IMHO $PerlIO::encoding::fallback is a liability, and instead of digging a local $PerlIO::encoding::fallback = FB_CROAK; May or may not DWIM depending on whether PerlIO::encoding was already Leon |
From @paliOn Thursday 10 August 2017 07:58:22 Leon Timmermans via RT wrote:
IMO this is different problem, not related to my proposed patches, which
I agree that above :encoding layer without STOP AT PARTIAL flag in |
From @paliOn Monday 14 August 2017 10:19:58 pali@cpan.org wrote:
And the goal of those changes is how Encode handle warnings. PerlIO is I would like to move forward and would like to hear if those Encode |
From @tonycozOn Mon, 21 Aug 2017 06:25:01 -0700, pali@cpan.org wrote:
Most of this could be fixed by PerlIO::encoding being a bit smarter with the check value - only setting WARN_ON_ERR when ckWARN(WARN_UTF8) is true. The only issue would be the utf8 subcategory warnings, like for surrogates, which your Encode patch goes to a lot of effort to pass through. But a lot of that effort is wasted, for example: @@ -407,23 +439,29 @@ CODE: utf8n_to_uvuni() will only warns if those warnings are lexically enabled, so here you're adding extra checks for each category that aren't needed. The same is true for the calls to uvuni_to_utf8_flags(). In another case you're adding a completely new warning: + if (encode_ckWARN(check, WARN_NONCHAR)) { which could probably just be made lexically scoped whether the new flag is set or not, since some of the others will be made so due to the changes to decode() and encode(). Of course, that change might be considered a backward incompatibility, since some warnings that were previously produced (because Encode does C<use warnings;> might no longer be (since the new scope might not.) Tony |
From @paliOn Sunday 27 August 2017 18:38:15 Tony Cook via RT wrote:
Seems yes.
Right, with my approach for Encode::ONLY_PRAGMA_WARNINGS, warnings would
All above warning should be sent when user calls Encode with FB_WARN
Fixing this bug could be mean as backward incompatible. But current Basically above code which you quote is implementation of UTF-16 Common operation is: I got binary data from 3rd module (or network) and |
From @tonycozFrom a perl build point of view there's a couple of issues: a) the MANIFEST is missing the new file and still contains the deleted file. b) there's two compilation errors in the build: Unicode.xs: In function ‘XS_Encode__Unicode_decode’: On Mon, 28 Aug 2017 00:50:49 -0700, pali@cpan.org wrote:
The problem is your patch makes these warnings lexically scoped with *only* FB_WARN: $ ./perl -Ilib -MEncode -le 'no warnings qw(surrogate); my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode::FB_WARN); print "no croak"' $ ./perl -Ilib -MEncode -le 'my $x = encode("utf-16", (my $y = chr(0xdc10)), &Encode::FB_WARN); print "no croak"' The utf8n_to_uvuni() function *already* checks that these warnings are lexically enabled. Also, it only checks them when it actually finds the problem being reported. Your code however calculates flags whether or not there's an error, and currently does it on every character encoded. When ONLY_PRAGMA_WARNINGS is set this results in three calls to Perl_ckwarn() inside the encoding loop. (This code should be using ckWARN_d() since these are default on warnings, but it should just set them all and let the API do the checking it does anyway.) An issue with changes to Encode::encode() as you might see above is the opaqueness of the warning: UTF-16 surrogate U+DC10 in goto at -e line 1. My code doesn't contain a goto. I think the only real fix to this would be to rewrite Encode::encode/decode in C. Tony |
From @paliOn Tuesday 29 August 2017 18:46:43 Tony Cook via RT wrote:
That is something which would be fixed when I start preparing pull
Ok, that can be fixed.
Ah :-( So we need utf8n_to_uvuni() function which report warnings even
Yes, this is something which needs to be fixed.
Do you have your implementation? At least I do not see any other patch I used goto in Encode::encode/decode dispatcher functions, so warnings It is possible to reimplement Encode::encode/decode function in C/XS in |
From @tonycozOn Wed, 30 Aug 2017 02:22:26 -0700, pali@cpan.org wrote:
The "My code" I was referring to is the one-liner above. I meant it would confuse the user whose code doesn't include a goto (just a call to encode/decode()).
I guessed that, but the existing encode/decode functions already has code that handles that: ...
The most complex part would probably be the call to find_encoding(), the rest I think is relatively simple. Tony |
From @paliOn Sunday 10 September 2017 17:01:40 Tony Cook via RT wrote:
So, you are referring to that unexpected warning with "goto" in its
Yes, but that code is wrong for more reasons. E.g. https://rt.cpan.org/Public/Bug/Display.html?id=120505 My approach try to use goto to run encode function in current context, And introduce Encode::ENCODE_WARN_ON_ERR, so above bug 120505 can be
Can you show me simple example how to write that "goto" part? |
From @tonycozOn Mon, 11 Sep 2017 00:24:32 -0700, pali@cpan.org wrote:
The warning would only be thrown if the code was called where that warning is enabled.
Yes, that would no longer be necessary if the functions were written as XS.
Did you mean ONLY_PRAGMA_WARNINGS here? One problem with implementing this as XS is the same problem as the goto solution - the call to utf8n_to_uvuni() would only produce warnings if the current context has utf8 warnings enabled, regardless of ONLY_PRAGMA_WARNINGS.
There's no goto involved, the XS simply doesn't have a new context to control the warnings, so the caller's warnings flags are what matter. Tony |
From @khwilliamsonOn 09/11/2017 06:13 PM, Tony Cook via RT wrote:
Note that there is a new API in 5.26 which allows you to get a bit field As it says in the docs: To do your own error handling, call this function with the UV utf8n_to_uvchr_error(const U8 *s, STRLEN curlen, If we get Devel::PPPort maintained, I had hoped to put it there. |
From @paliOn Monday 11 September 2017 17:13:17 Tony Cook via RT wrote:
Exactly.
Right, I did copy+paste error.
In XS, context warnings could be temporary changed.
Ok, then it should be just call_sv() instead of perl's goto, right? |
From @paliOn Monday 11 September 2017 18:24:43 karl williamson via RT wrote:
That would really help! Thanks for pointer.
But Devel::PPPort is unmaintained... What about taking it into p5p? Or |
From @paliIn attachment you can find patch with rewritten encode/decode/from_to With this patch there is no goto in warning messages... $ perl -Iblib/lib -Iblib/arch -MEncode -le 'my $x = encode("utf-16", (my $y = chr(0xdc10)), UTF-16 surrogate U+DC10 in subroutine entry at -e line 1. |
From @pali0003-Rewrite-encode-decode-encode_utf8-decode_utf8-and-fr.patchFrom 8e44ad64885c76f49e155ddb8cf6c5fcb5a2a011 Mon Sep 17 00:00:00 2001
From: Pali <[email protected]>
Date: Wed, 13 Sep 2017 00:30:29 +0200
Subject: [PATCH] Rewrite encode, decode, encode_utf8, decode_utf8 and from_to
functions to XS
---
Encode.pm | 68 ---------------------
Encode.xs | 196 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 196 insertions(+), 68 deletions(-)
diff --git a/Encode.pm b/Encode.pm
index ce30bd5..6ed4a77 100644
--- a/Encode.pm
+++ b/Encode.pm
@@ -171,74 +171,6 @@ sub clone_encoding($) {
return Storable::dclone($obj);
}
-sub encode($$;$) {
- my $name = $_[0];
- my $check = $_[2];
- Carp::croak("Encoding name should not be undef") unless defined $name;
- my $enc = find_encoding($name);
- Carp::croak("Unknown encoding '$name'") unless defined $enc;
- my $encode = $enc->can('encode');
- Carp::croak("No function 'encode' for encoding '$name'") unless defined $encode;
- $check ||= 0;
- splice(@_, 0, 1, $enc);
- if (ref $check or !$check or ($check & LEAVE_SRC)) {
- my $string = $_[1];
- splice(@_, 1, 1, $string);
- }
- splice(@_, 2, 1, $check);
- goto &$encode;
-}
-*str2bytes = \&encode;
-
-sub decode($$;$) {
- my $name = $_[0];
- my $check = $_[2];
- Carp::croak("Encoding name should not be undef") unless defined $name;
- my $enc = find_encoding($name);
- Carp::croak("Unknown encoding '$name'") unless defined $enc;
- my $decode = $enc->can('decode');
- Carp::croak("No function 'decode' for encoding '$name'") unless defined $decode;
- $check ||= 0;
- splice(@_, 0, 1, $enc);
- if (ref $check or !$check or ($check & LEAVE_SRC)) {
- my $octets = $_[1];
- splice(@_, 1, 1, $octets);
- }
- splice(@_, 2, 1, $check);
- goto &$decode;
-}
-*bytes2str = \&decode;
-
-sub from_to($$$;$) {
- my ( $string, $from, $to, $check ) = @_;
- Carp::croak("Encoding name should not be undef") unless defined $from and defined $to;
- my $f = find_encoding($from);
- Carp::croak("Unknown encoding '$from'") unless defined $f;
- my $t = find_encoding($to);
- Carp::croak("Unknown encoding '$to'") unless defined $t;
- return undef unless defined $string;
- $check ||= 0;
- my $uni = $f->decode($string);
- $_[0] = $string = $t->encode( $uni, $check );
- return undef if ( $check && length($uni) );
- return defined( $_[0] ) ? length($string) : undef;
-}
-
-sub encode_utf8($) {
- my ($str) = @_;
- return undef unless defined $str;
- utf8::encode($str);
- return $str;
-}
-
-my $utf8enc;
-
-sub decode_utf8($;$) {
- $utf8enc ||= find_encoding('utf8');
- unshift(@_, $utf8enc);
- goto &{$utf8enc->can('decode')};
-}
-
onBOOT;
if ($ON_EBCDIC) {
diff --git a/Encode.xs b/Encode.xs
index c1222a1..2cd6f4b 100644
--- a/Encode.xs
+++ b/Encode.xs
@@ -35,6 +35,14 @@ UNIMPLEMENTED(_encoded_bytes_to_utf8, I32)
#define SvIV_nomg SvIV
#endif
+#ifndef SvTRUE_nomg
+#define SvTRUE_nomg SvTRUE
+#endif
+
+#ifndef SVfARG
+#define SVfARG(p) ((void*)(p))
+#endif
+
static void
Encode_XSEncoding(pTHX_ encode_t * enc)
{
@@ -589,6 +597,83 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
return s;
}
+static SV *
+find_encoding(pTHX_ SV *enc)
+{
+ dSP;
+ I32 count;
+ SV *m_enc;
+ SV *obj = &PL_sv_undef;
+#ifndef SV_NOSTEAL
+ U32 tmp;
+#endif
+
+ ENTER;
+ SAVETMPS;
+ PUSHMARK(sp);
+
+ m_enc = sv_newmortal();
+#ifndef SV_NOSTEAL
+ tmp = SvFLAGS(enc) & SVs_TEMP;
+ SvTEMP_off(enc);
+ sv_setsv_flags(m_enc, enc, 0);
+ SvFLAGS(enc) |= tmp;
+#else
+ sv_setsv_flags(m_enc, enc, SV_NOSTEAL);
+#endif
+ XPUSHs(m_enc);
+
+ PUTBACK;
+
+ count = call_pv("Encode::find_encoding", G_SCALAR);
+
+ SPAGAIN;
+
+ if (count > 0) {
+ obj = POPs;
+ SvREFCNT_inc(obj);
+ }
+
+ PUTBACK;
+ FREETMPS;
+ LEAVE;
+ return sv_2mortal(obj);
+}
+
+static SV *
+call_encoding(pTHX_ const char *method, SV *obj, SV *src, SV *check)
+{
+ dSP;
+ I32 count;
+ SV *dst = &PL_sv_undef;
+
+ PUSHMARK(sp);
+
+ if (check)
+ check = sv_2mortal(newSVsv(check));
+
+ if (!check || SvROK(check) || !SvTRUE_nomg(check) || (SvIV_nomg(check) & ENCODE_LEAVE_SRC))
+ src = sv_2mortal(newSVsv(src));
+
+ XPUSHs(obj);
+ XPUSHs(src);
+ XPUSHs(check ? check : &PL_sv_no);
+
+ PUTBACK;
+
+ count = call_method(method, G_SCALAR);
+
+ SPAGAIN;
+
+ if (count > 0) {
+ dst = POPs;
+ SvREFCNT_inc(dst);
+ }
+
+ PUTBACK;
+ return dst;
+}
+
MODULE = Encode PACKAGE = Encode::utf8 PREFIX = Method_
@@ -1054,6 +1139,117 @@ CODE:
OUTPUT:
RETVAL
+SV *
+decode(encoding, octets, check = NULL)
+SV * encoding
+SV * octets
+SV * check
+ALIAS:
+ bytes2str = 0
+PREINIT:
+ SV *obj;
+INIT:
+ SvGETMAGIC(encoding);
+CODE:
+ if (!SvOK(encoding))
+ croak("Encoding name should not be undef");
+ obj = find_encoding(aTHX_ encoding);
+ if (!SvOK(obj))
+ croak("Unknown encoding '%" SVf "'", SVfARG(encoding));
+ RETVAL = call_encoding(aTHX_ "decode", obj, octets, check);
+OUTPUT:
+ RETVAL
+
+SV *
+encode(encoding, string, check = NULL)
+SV * encoding
+SV * string
+SV * check
+ALIAS:
+ str2bytes = 0
+PREINIT:
+ SV *obj;
+INIT:
+ SvGETMAGIC(encoding);
+CODE:
+ if (!SvOK(encoding))
+ croak("Encoding name should not be undef");
+ obj = find_encoding(aTHX_ encoding);
+ if (!SvOK(obj))
+ croak("Unknown encoding '%" SVf "'", SVfARG(encoding));
+ RETVAL = call_encoding(aTHX_ "encode", obj, string, check);
+OUTPUT:
+ RETVAL
+
+SV *
+decode_utf8(octets, check = NULL)
+SV * octets
+SV * check
+PREINIT:
+ HV *hv;
+ SV **sv;
+CODE:
+ hv = get_hv("Encode::Encoding", 0);
+ if (!hv)
+ croak("utf8 encoding was not found");
+ sv = hv_fetch(hv, "utf8", 4, 0);
+ if (!sv || !*sv || !SvOK(*sv))
+ croak("utf8 encoding was not found");
+ RETVAL = call_encoding(aTHX_ "decode", *sv, octets, check);
+OUTPUT:
+ RETVAL
+
+SV *
+encode_utf8(string)
+SV * string
+CODE:
+ RETVAL = newSVsv(string);
+ if (SvOK(RETVAL))
+ sv_utf8_encode(RETVAL);
+OUTPUT:
+ RETVAL
+
+SV *
+from_to(octets, from, to, check = NULL)
+SV * octets
+SV * from
+SV * to
+SV * check
+PREINIT:
+ SV *from_obj;
+ SV *to_obj;
+ SV *string;
+ SV *new_octets;
+ U8 *ptr;
+ STRLEN len;
+INIT:
+ SvGETMAGIC(from);
+ SvGETMAGIC(to);
+CODE:
+ if (!SvOK(from) || !SvOK(to))
+ croak("Encoding name should not be undef");
+ from_obj = find_encoding(aTHX_ from);
+ if (!SvOK(from_obj))
+ croak("Unknown encoding '%" SVf "'", SVfARG(from));
+ to_obj = find_encoding(aTHX_ to);
+ if (!SvOK(to_obj))
+ croak("Unknown encoding '%" SVf "'", SVfARG(to));
+ string = sv_2mortal(call_encoding(aTHX_ "decode", from_obj, octets, NULL));
+ new_octets = sv_2mortal(call_encoding(aTHX_ "encode", to_obj, string, check));
+ SvGETMAGIC(new_octets);
+ if (SvOK(new_octets) && (!check || SvROK(check) || !SvTRUE_nomg(check) || sv_len(string) == 0)) {
+ ptr = (U8 *)SvPV_nomg(new_octets, len);
+ if (SvUTF8(new_octets))
+ len = utf8_length(ptr, ptr+len);
+ RETVAL = newSVuv(len);
+ } else {
+ RETVAL = &PL_sv_undef;
+ }
+ sv_setsv_nomg(octets, new_octets);
+ SvSETMAGIC(octets);
+OUTPUT:
+ RETVAL
+
void
onBOOT()
CODE:
--
1.7.9.5
|
From @tonycozOn Tue, 12 Sep 2017 16:06:31 -0700, pali@cpan.org wrote:
That looks sane to me. Tony |
From @paliIn attachment is a v2 version of Encode patch which handle warnings of Hopefully now it should be complete, just there is missing support for |
From @pali0001-Automatically-compute-length-in-attr-macro.patchFrom 7b0bc1224d51a23a9475447e15d8085c7d0a5005 Mon Sep 17 00:00:00 2001
From: Pali <[email protected]>
Date: Wed, 7 Feb 2018 22:42:54 +0100
Subject: [PATCH] Automatically compute length in attr() macro
---
Unicode/Unicode.xs | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/Unicode/Unicode.xs b/Unicode/Unicode.xs
index b3b1d2f..d3d8223 100644
--- a/Unicode/Unicode.xs
+++ b/Unicode/Unicode.xs
@@ -123,8 +123,8 @@ MODULE = Encode::Unicode PACKAGE = Encode::Unicode
PROTOTYPES: DISABLE
-#define attr(k, l) (hv_exists((HV *)SvRV(obj),k,l) ? \
- *hv_fetch((HV *)SvRV(obj),k,l,0) : &PL_sv_undef)
+#define attr(k) (hv_exists((HV *)SvRV(obj),"" k "",sizeof(k)-1) ? \
+ *hv_fetch((HV *)SvRV(obj),"" k "",sizeof(k)-1,0) : &PL_sv_undef)
void
decode(obj, str, check = 0)
@@ -133,9 +133,9 @@ SV * str
IV check
CODE:
{
- SV *sve = attr("endian", 6);
+ SV *sve = attr("endian");
U8 endian = *((U8 *)SvPV_nolen(sve));
- SV *svs = attr("size", 4);
+ SV *svs = attr("size");
int size = SvIV(svs);
int ucs2 = -1; /* only needed in the event of surrogate pairs */
SV *result = newSVpvn("",0);
@@ -209,7 +209,7 @@ CODE:
}
#if 1
/* Update endian for next sequence */
- sv = attr("renewed", 7);
+ sv = attr("renewed");
if (SvTRUE(sv)) {
(void)hv_store((HV *)SvRV(obj),"endian",6,newSVpv((char *)&endian,1),0);
}
@@ -229,7 +229,7 @@ CODE:
U8 *d;
if (issurrogate(ord)) {
if (ucs2 == -1) {
- SV *sv = attr("ucs2", 4);
+ SV *sv = attr("ucs2");
ucs2 = SvTRUE(sv);
}
if (ucs2 || size == 4) {
@@ -351,9 +351,9 @@ SV * utf8
IV check
CODE:
{
- SV *sve = attr("endian", 6);
+ SV *sve = attr("endian");
U8 endian = *((U8 *)SvPV_nolen(sve));
- SV *svs = attr("size", 4);
+ SV *svs = attr("size");
const int size = SvIV(svs);
int ucs2 = -1; /* only needed if there is invalid_ucs2 input */
const STRLEN usize = (size > 0 ? size : 1);
@@ -399,7 +399,7 @@ CODE:
enc_pack(aTHX_ result,size,endian,BOM_BE);
#if 1
/* Update endian for next sequence */
- sv = attr("renewed", 7);
+ sv = attr("renewed");
if (SvTRUE(sv)) {
(void)hv_store((HV *)SvRV(obj),"endian",6,newSVpv((char *)&endian,1),0);
}
@@ -416,7 +416,7 @@ CODE:
if (size != 4 && invalid_ucs2(ord)) {
if (!issurrogate(ord)) {
if (ucs2 == -1) {
- SV *sv = attr("ucs2", 4);
+ SV *sv = attr("ucs2");
ucs2 = SvTRUE(sv);
}
if (ucs2 || ord > 0x10FFFF) {
--
2.11.0
|
From @paliv2-0001-Encode-Add-new-check-flag-Encode-ONLY_PRAGMA_WARNING.patchFrom c99c35366e0f95dd985b749d8ba1b0ea763b3f65 Mon Sep 17 00:00:00 2001
From: Pali <[email protected]>
Date: Sat, 1 Jul 2017 12:32:34 +0200
Subject: [PATCH v2] Encode: Add new check flag Encode::ONLY_PRAGMA_WARNINGS
When this new flag is set then only warnings configured and enabled by
pragma warnings are reported. It has no effect without setting check flag
Encode::ENCODE_WARN_ON_ERR.
---
Encode.pm | 131 +++++++------------------
Encode.xs | 13 ++-
Encode/encode.h | 4 +
Unicode/Unicode.xs | 148 +++++++++++++++++-----------
t/decode.t | 9 +-
t/enc_eucjp.t | 4 +
t/utf32warnings.t | 283 +++++++++++++++++++++++++++++++++++++++++++++++++++++
t/utf8messages.t | 33 -------
t/utf8warnings.t | 109 +++++----------------
9 files changed, 460 insertions(+), 274 deletions(-)
create mode 100644 t/utf32warnings.t
delete mode 100644 t/utf8messages.t
diff --git a/Encode.pm b/Encode.pm
index 249ac6b..7ee988f 100644
--- a/Encode.pm
+++ b/Encode.pm
@@ -14,6 +14,7 @@ BEGIN {
use Exporter 5.57 'import';
+use Carp ();
our @CARP_NOT = qw(Encode::Encoder);
# Public, encouraged API is exported by default
@@ -171,109 +172,54 @@ sub clone_encoding($) {
}
sub encode($$;$) {
- my ( $name, $string, $check ) = @_;
- return undef unless defined $string;
- $string .= ''; # stringify;
- $check ||= 0;
- unless ( defined $name ) {
- require Carp;
- Carp::croak("Encoding name should not be undef");
- }
+ my $name = $_[0];
+ my $check = $_[2];
+ Carp::croak("Encoding name should not be undef") unless defined $name;
my $enc = find_encoding($name);
- unless ( defined $enc ) {
- require Carp;
- Carp::croak("Unknown encoding '$name'");
- }
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $octets;
- if ( ref($enc) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $octets = $enc->encode( $string, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $octets = $enc->encode( $string, $check );
+ Carp::croak("Unknown encoding '$name'") unless defined $enc;
+ my $encode = $enc->can('encode');
+ Carp::croak("No function 'encode' for encoding '$name'") unless defined $encode;
+ $check ||= 0;
+ splice(@_, 0, 1, $enc);
+ if (ref $check or !$check or ($check & LEAVE_SRC)) {
+ my $string = $_[1];
+ splice(@_, 1, 1, $string);
}
- $_[1] = $string if $check and !ref $check and !( $check & LEAVE_SRC );
- return $octets;
+ splice(@_, 2, 1, $check);
+ goto &$encode;
}
*str2bytes = \&encode;
sub decode($$;$) {
- my ( $name, $octets, $check ) = @_;
- return undef unless defined $octets;
- $octets .= '';
- $check ||= 0;
+ my $name = $_[0];
+ my $check = $_[2];
+ Carp::croak("Encoding name should not be undef") unless defined $name;
my $enc = find_encoding($name);
- unless ( defined $enc ) {
- require Carp;
- Carp::croak("Unknown encoding '$name'");
- }
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $string;
- if ( ref($enc) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $string = $enc->decode( $octets, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $string = $enc->decode( $octets, $check );
+ Carp::croak("Unknown encoding '$name'") unless defined $enc;
+ my $decode = $enc->can('decode');
+ Carp::croak("No function 'decode' for encoding '$name'") unless defined $decode;
+ $check ||= 0;
+ splice(@_, 0, 1, $enc);
+ if (ref $check or !$check or ($check & LEAVE_SRC)) {
+ my $octets = $_[1];
+ splice(@_, 1, 1, $octets);
}
- $_[1] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
- return $string;
+ splice(@_, 2, 1, $check);
+ goto &$decode;
}
*bytes2str = \&decode;
sub from_to($$$;$) {
my ( $string, $from, $to, $check ) = @_;
- return undef unless defined $string;
- $check ||= 0;
+ Carp::croak("Encoding name should not be undef") unless defined $from and defined $to;
my $f = find_encoding($from);
- unless ( defined $f ) {
- require Carp;
- Carp::croak("Unknown encoding '$from'");
- }
+ Carp::croak("Unknown encoding '$from'") unless defined $f;
my $t = find_encoding($to);
- unless ( defined $t ) {
- require Carp;
- Carp::croak("Unknown encoding '$to'");
- }
-
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $uni;
- if ( ref($f) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $uni = $f->decode($string);
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $uni = $f->decode($string);
- }
-
- if ( ref($t) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $_[0] = $string = $t->encode( $uni, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $_[0] = $string = $t->encode( $uni, $check );
- }
-
+ Carp::croak("Unknown encoding '$to'") unless defined $t;
+ return undef unless defined $string;
+ $check ||= 0;
+ my $uni = $f->decode($string);
+ $_[0] = $string = $t->encode( $uni, $check );
return undef if ( $check && length($uni) );
return defined( $_[0] ) ? length($string) : undef;
}
@@ -288,14 +234,9 @@ sub encode_utf8($) {
my $utf8enc;
sub decode_utf8($;$) {
- my ( $octets, $check ) = @_;
- return undef unless defined $octets;
- $octets .= '';
- $check ||= 0;
$utf8enc ||= find_encoding('utf8');
- my $string = $utf8enc->decode( $octets, $check );
- $_[0] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
- return $string;
+ unshift(@_, $utf8enc);
+ goto &{$utf8enc->can('decode')};
}
onBOOT;
diff --git a/Encode.xs b/Encode.xs
index 6c077be..c1222a1 100644
--- a/Encode.xs
+++ b/Encode.xs
@@ -250,7 +250,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
(UV)ch, enc->name[0]);
return &PL_sv_undef; /* never reaches but be safe */
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
Perl_warner(aTHX_ packWARN(WARN_UTF8),
ERR_ENCODE_NOMAP, (UV)ch, enc->name[0]);
}
@@ -289,7 +289,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
enc->name[0], (UV)s[slen]);
return &PL_sv_undef; /* never reaches but be safe */
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
Perl_warner(
aTHX_ packWARN(WARN_UTF8),
ERR_DECODE_NOMAP,
@@ -446,7 +446,11 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
char esc[UTF8_MAXLEN * 6 + 1];
STRLEN i;
- if (SvROK(check_sv)) {
+ if (!SvOK(check_sv)) {
+ fallback_cb = &PL_sv_undef;
+ check = 0;
+ }
+ else if (SvROK(check_sv)) {
/* croak("UTF-8 decoder doesn't support callback CHECK"); */
fallback_cb = check_sv;
check = ENCODE_PERLQQ|ENCODE_LEAVE_SRC; /* same as perlqq */
@@ -520,7 +524,7 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
else
Perl_croak(aTHX_ ERR_DECODE_STR_NOMAP, (strict ? "UTF-8" : "utf8"), esc);
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
if (encode)
Perl_warner(aTHX_ packWARN(WARN_UTF8),
ERR_ENCODE_NOMAP, uv, (strict ? "UTF-8" : "utf8"));
@@ -1064,6 +1068,7 @@ BOOT:
newCONSTSUB(stash, "WARN_ON_ERR", newSViv(ENCODE_WARN_ON_ERR));
newCONSTSUB(stash, "RETURN_ON_ERR", newSViv(ENCODE_RETURN_ON_ERR));
newCONSTSUB(stash, "LEAVE_SRC", newSViv(ENCODE_LEAVE_SRC));
+ newCONSTSUB(stash, "ONLY_PRAGMA_WARNINGS", newSViv(ENCODE_ONLY_PRAGMA_WARNINGS));
newCONSTSUB(stash, "PERLQQ", newSViv(ENCODE_PERLQQ));
newCONSTSUB(stash, "HTMLCREF", newSViv(ENCODE_HTMLCREF));
newCONSTSUB(stash, "XMLCREF", newSViv(ENCODE_XMLCREF));
diff --git a/Encode/encode.h b/Encode/encode.h
index 5fbcf76..1c7c066 100644
--- a/Encode/encode.h
+++ b/Encode/encode.h
@@ -94,6 +94,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
#define ENCODE_WARN_ON_ERR 0x0002 /* warn on error; may proceed */
#define ENCODE_RETURN_ON_ERR 0x0004 /* immediately returns on NOREP */
#define ENCODE_LEAVE_SRC 0x0008 /* $src updated unless set */
+#define ENCODE_ONLY_PRAGMA_WARNINGS 0x0010 /* when enabled report only warnings configured by pragma warnings, otherwise report all warnings; no effect without ENCODE_WARN_ON_ERR */
#define ENCODE_PERLQQ 0x0100 /* perlqq fallback string */
#define ENCODE_HTMLCREF 0x0200 /* HTML character ref. fb mode */
#define ENCODE_XMLCREF 0x0400 /* XML character ref. fb mode */
@@ -107,4 +108,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
#define ENCODE_FB_HTMLCREF (ENCODE_HTMLCREF|ENCODE_LEAVE_SRC)
#define ENCODE_FB_XMLCREF (ENCODE_XMLCREF|ENCODE_LEAVE_SRC)
+#define encode_ckWARN(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || ckWARN(w)))
+#define encode_ckWARN_packed(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || Perl_ckwarn(aTHX_ w)))
+
#endif /* ENCODE_H */
diff --git a/Unicode/Unicode.xs b/Unicode/Unicode.xs
index d3d8223..0b3a413 100644
--- a/Unicode/Unicode.xs
+++ b/Unicode/Unicode.xs
@@ -17,14 +17,8 @@
#define isLoSurrogate(x) (0xDC00 <= (x) && (x) <= 0xDFFF )
#define invalid_ucs2(x) ( issurrogate(x) || 0xFFFF < (x) )
-/* For pre-5.14 source compatibility */
-#ifndef UNICODE_WARN_ILLEGAL_INTERCHANGE
-# define UNICODE_WARN_ILLEGAL_INTERCHANGE 0
-# define UTF8_DISALLOW_SURROGATE 0
-# define UTF8_WARN_SURROGATE 0
-# define UTF8_DISALLOW_FE_FF 0
-# define UTF8_WARN_FE_FF 0
-# define UTF8_WARN_NONCHAR 0
+#ifndef SVfARG
+#define SVfARG(p) ((void*)(p))
#endif
#define PERLIO_BUFSIZ 1024 /* XXX value comes from PerlIOEncode_get_base */
@@ -133,6 +127,7 @@ SV * str
IV check
CODE:
{
+ SV *name = attr("Name");
SV *sve = attr("endian");
U8 endian = *((U8 *)SvPV_nolen(sve));
SV *svs = attr("size");
@@ -233,53 +228,61 @@ CODE:
ucs2 = SvTRUE(sv);
}
if (ucs2 || size == 4) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":no surrogates allowed %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
+ SVfARG(name), ord);
+ }
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":no surrogates allowed %" UVxf,
+ SVfARG(name), ord);
}
ord = FBCHAR;
}
else {
UV lo;
if (!isHiSurrogate(ord)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Malformed HI surrogate %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
+ SVfARG(name), ord);
}
- else {
- ord = FBCHAR;
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed HI surrogate %" UVxf,
+ SVfARG(name), ord);
}
+ ord = FBCHAR;
}
else if (s+size > e) {
- if (check) {
- if (check & ENCODE_STOP_AT_PARTIAL) {
- s -= size;
- break;
- }
- else {
- croak("%" SVf ":Malformed HI surrogate %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
- }
+ if (check & ENCODE_STOP_AT_PARTIAL) {
+ s -= size;
+ break;
}
- else {
- ord = FBCHAR;
+ if (check & ENCODE_DIE_ON_ERR) {
+ croak("%" SVf ":Malformed HI surrogate %" UVxf,
+ SVfARG(name), ord);
+ }
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed HI surrogate %" UVxf,
+ SVfARG(name), ord);
}
+ ord = FBCHAR;
}
else {
lo = enc_unpack(aTHX_ &s,e,size,endian);
if (!isLoSurrogate(lo)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Malformed LO surrogate %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
+ SVfARG(name), ord);
}
- else {
- s -= size;
- ord = FBCHAR;
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed LO surrogate %" UVxf,
+ SVfARG(name), ord);
}
+ s -= size;
+ ord = FBCHAR;
}
else {
ord = 0x10000 + ((ord - 0xD800) << 10) + (lo - 0xDC00);
@@ -289,13 +292,16 @@ CODE:
}
if ((ord & 0xFFFE) == 0xFFFE || (ord >= 0xFDD0 && ord <= 0xFDEF)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Unicode character %" UVxf " is illegal",
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
- } else {
- ord = FBCHAR;
+ SVfARG(name), ord);
}
+ if (encode_ckWARN(check, WARN_NONCHAR)) {
+ warner(packWARN(WARN_NONCHAR),
+ "%" SVf ":Unicode character %" UVxf " is illegal",
+ SVfARG(name), ord);
+ }
+ ord = FBCHAR;
}
if (resultbuflen < SvCUR(result) + UTF8_MAXLEN + 1) {
@@ -315,16 +321,28 @@ CODE:
resultbuflen = SvLEN(result);
}
- d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), ord,
- UNICODE_WARN_ILLEGAL_INTERCHANGE);
+ HV *hv = NULL;
+ d = uvchr_to_utf8_flags_msgs(resultbuf+SvCUR(result), UNI_TO_NATIVE(ord), UNICODE_DISALLOW_ILLEGAL_INTERCHANGE | UNICODE_WARN_ILLEGAL_INTERCHANGE, &hv);
+ if (hv) {
+ sv_2mortal((SV *)hv);
+ SV *message = *hv_fetch(hv, "text", 4, 0);
+ U32 categories = SvUVx(*hv_fetch(hv, "warn_categories", 15, 0));
+ if (check & ENCODE_DIE_ON_ERR)
+ croak("%" SVf, SVfARG(message));
+ if (encode_ckWARN_packed(check, categories))
+ warner(categories, "%" SVf, SVfARG(message));
+ d = uvuni_to_utf8_flags(resultbuf+SvCUR(result), FBCHAR, 0);
+ }
+
SvCUR_set(result, d - (U8 *)SvPVX(result));
}
if (s < e) {
/* unlikely to happen because it's fixed-length -- dankogai */
- if (check & ENCODE_WARN_ON_ERR) {
- Perl_warner(aTHX_ packWARN(WARN_UTF8),"%" SVf ":Partial character",
- *hv_fetch((HV *)SvRV(obj),"Name",4,0));
+ if (check & ENCODE_DIE_ON_ERR)
+ croak("%" SVf ":Partial character", SVfARG(name));
+ if (encode_ckWARN(check, WARN_UTF8)) {
+ warner(packWARN(WARN_UTF8),"%" SVf ":Partial character", SVfARG(name));
}
}
if (check && !(check & ENCODE_LEAVE_SRC)) {
@@ -351,6 +369,7 @@ SV * utf8
IV check
CODE:
{
+ SV *name = attr("Name");
SV *sve = attr("endian");
U8 endian = *((U8 *)SvPV_nolen(sve));
SV *svs = attr("size");
@@ -406,25 +425,43 @@ CODE:
#endif
}
while (s < e && s+UTF8SKIP(s) <= e) {
- STRLEN len;
- UV ord = utf8n_to_uvuni(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
- |UTF8_WARN_SURROGATE
- |UTF8_DISALLOW_FE_FF
- |UTF8_WARN_FE_FF
- |UTF8_WARN_NONCHAR));
- s += len;
- if (size != 4 && invalid_ucs2(ord)) {
+ STRLEN len;
+ AV *msgs = NULL;
+ UV ord = NATIVE_TO_UNI(utf8n_to_uvchr_msgs(s, e-s, &len, UTF8_DISALLOW_ILLEGAL_INTERCHANGE | UTF8_WARN_ILLEGAL_INTERCHANGE, NULL, &msgs));
+ if (msgs) {
+ SSize_t i;
+ SSize_t len = av_len(msgs)+1;
+ sv_2mortal((SV *)msgs);
+ for (i = 0; i < len; ++i) {
+ SV *sv = *av_fetch(msgs, i, 0);
+ HV *hv = (HV *)SvRV(sv);
+ SV *message = *hv_fetch(hv, "text", 4, 0);
+ U32 categories = SvUVx(*hv_fetch(hv, "warn_categories", 15, 0));
+ if (check & ENCODE_DIE_ON_ERR)
+ croak("%" SVf, SVfARG(message));
+ if (encode_ckWARN_packed(check, categories))
+ warner(categories, "%" SVf, SVfARG(message));
+ }
+ }
+ if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {
if (!issurrogate(ord)) {
if (ucs2 == -1) {
SV *sv = attr("ucs2");
ucs2 = SvTRUE(sv);
}
if (ucs2 || ord > 0x10FFFF) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":code point \"\\x{%" UVxf "}\" too high",
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),ord);
+ SVfARG(name),ord);
+ }
+ if (encode_ckWARN(check, WARN_NON_UNICODE)) {
+ warner(packWARN(WARN_NON_UNICODE),
+ "%" SVf ":code point \"\\x{%" UVxf "}\" too high",
+ SVfARG(name),ord);
}
enc_pack(aTHX_ result,size,endian,FBCHAR);
+ } else if (ord == 0) {
+ enc_pack(aTHX_ result,size,endian,FBCHAR);
} else {
UV hi = ((ord - 0x10000) >> 10) + 0xD800;
UV lo = ((ord - 0x10000) & 0x3FF) + 0xDC00;
@@ -440,6 +477,7 @@ CODE:
else {
enc_pack(aTHX_ result,size,endian,ord);
}
+ s += len;
}
if (s < e) {
/* UTF-8 partial char happens often on PerlIO.
@@ -449,7 +487,7 @@ CODE:
if (check & (ENCODE_DIE_ON_ERR|ENCODE_LEAVE_SRC)) {
Perl_croak(aTHX_ "%" SVf ":partial character is not allowed "
"when CHECK = 0x%" UVuf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0), check);
+ SVfARG(name), check);
}
}
if (check && !(check & ENCODE_LEAVE_SRC)) {
diff --git a/t/decode.t b/t/decode.t
index 93c992c..0c3b669 100644
--- a/t/decode.t
+++ b/t/decode.t
@@ -51,9 +51,12 @@ $orig = "\xC3\x80";
$orig =~ /(..)/;
is(Encode::decode_utf8($1), "\N{U+C0}", 'passing magic regex to Encode::decode_utf8');
-$orig = "\xC3\x80";
-*a = $orig;
-is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+SKIP: {
+ skip "Perl Version ($]) is older than v5.27.1", 1 if $] < 5.027001;
+ $orig = "\xC3\x80";
+ *a = $orig;
+ is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+}
$orig = "\N{U+C0}";
$orig =~ /(.)/;
diff --git a/t/enc_eucjp.t b/t/enc_eucjp.t
index fc0af3c..8f933b0 100644
--- a/t/enc_eucjp.t
+++ b/t/enc_eucjp.t
@@ -25,6 +25,10 @@ BEGIN {
}
}
+use Encode qw();
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+use warnings "utf8";
+
no warnings "deprecated";
use encoding 'euc-jp';
diff --git a/t/utf32warnings.t b/t/utf32warnings.t
new file mode 100644
index 0000000..d84939f
--- /dev/null
+++ b/t/utf32warnings.t
@@ -0,0 +1,283 @@
+use strict;
+use warnings;
+BEGIN {
+ if ($] < 5.014){
+ print "1..0 # Skip: Perl 5.14.0 or later required\n";
+ exit 0;
+ }
+}
+
+my $script = quotemeta $0;
+
+use Encode;
+use Test::More tests => 38;
+
+my $valid = "\x61\x00\x00\x00";
+my $invalid = "\x78\x56\x34\x12";
+
+our $warn;
+$SIG{__WARN__} = sub { $warn = $_[0] };
+
+my $enc = find_encoding("UTF32-LE");
+
+{
+ local $warn;
+ my $ret = $enc->encode( "a", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Calling encode on UTF32-LE encode object with valid string produces no warnings");
+ is($ret, $valid, "Calling encode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling encode on UTF32-LE encode object with invalid string warns");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from encode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from encode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from encode method of UTF32-LE encode object cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+ local $warn;
+ no warnings;
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from encode method of UTF32-LE encode object cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+ local $warn;
+ my $ret = Encode::encode( $enc, "a", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Calling Encode::encode for UTF32-LE with valid string produces no warnings");
+ is($ret, $valid, "Calling Encode::encode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling Encode::encode for UTF32-LE with invalid string warns");
+}
+
+
+{
+ local $warn;
+ no warnings 'utf8';
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from Encode::encode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from Encode::encode for UTF32-LE can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from Encode::encode for UTF32-LE cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+ local $warn;
+ no warnings;
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from Encode::encode for UTF32-LE cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+ local $warn;
+ my $ret = $enc->decode( $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Calling decode on UTF32-LE encode object with valid string produces no warnings");
+ is($ret, "a", "Calling decode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Calling decode on UTF32-LE encode object with invalid string warns");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Warning from decode method of UTF32-LE encode object cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+ local $warn;
+ no warnings;
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Warning from decode method of UTF32-LE encode object cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+ local $warn;
+ my $ret = Encode::decode( $enc, $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Calling Encode::decode for UTF32-LE with valid string produces no warnings");
+ is($ret, "a", "Calling Encode::decode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Calling Encode::decode for UTF32-LE with invalid string warns");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Warning from Encode::decode for UTF32-LE cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+ local $warn;
+ no warnings;
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Warning from Encode::decode for UTF32-LE cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+use PerlIO::encoding;
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+
+{
+ local $warn;
+ my $tmp = $valid;
+ $tmp .= ''; # de-COW
+ open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+ my $str = <$fh>;
+ close $fh;
+ is($warn, undef, "Calling PerlIO :encoding on valid string produces no warnings");
+ is($str, "a", "PerlIO decodes string correctly");
+}
+
+
+{
+ local $warn;
+ my $tmp = $invalid;
+ use Devel::Peek;
+ $tmp .= ''; # de-COW
+ open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+ my $str = <$fh>;
+ close $fh;
+ like($warn, qr/may not be portable.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+ local $warn;
+ my $tmp = $invalid;
+ $tmp .= ''; # de-COW
+ no warnings 'utf8';
+ open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+ my $str = <$fh>;
+ close $fh;
+ is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ my $tmp = $invalid;
+ $tmp .= ''; # de-COW
+ no warnings;
+ open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+ my $str = <$fh>;
+ close $fh;
+ is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
+
+
+{
+ local $warn;
+ my $str;
+ open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+ print $fh "a";
+ close $fh;
+ is($warn, undef, "Calling PerlIO :encoding on valid string produces no warnings");
+ is($str, $valid, "PerlIO encodes string correctly");
+}
+
+
+{
+ local $warn;
+ my $str;
+ open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+ print $fh "\x{D800}";
+ close $fh;
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+ local $warn;
+ my $str;
+ no warnings 'utf8';
+ open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+ print $fh "\x{D800}";
+ close $fh;
+ is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ my $str;
+ no warnings;
+ open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+ print $fh "\x{D800}";
+ close $fh;
+ is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
diff --git a/t/utf8messages.t b/t/utf8messages.t
deleted file mode 100644
index 8b6b379..0000000
--- a/t/utf8messages.t
+++ /dev/null
@@ -1,33 +0,0 @@
-use strict;
-use warnings;
-BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-
-use Test::More;
-use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-
-plan tests => 12;
-
-my @invalid;
-
-ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
-like $@, qr/^"\\x\{d800\}" does not map to UTF-8 /, 'Error message contains strict UTF-8 name';
-@invalid = ();
-encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-
-ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
-like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
-decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
diff --git a/t/utf8warnings.t b/t/utf8warnings.t
index 0d1ac6d..e4e4304 100644
--- a/t/utf8warnings.t
+++ b/t/utf8warnings.t
@@ -1,94 +1,35 @@
use strict;
use warnings;
-BEGIN {
- if ($] < 5.014){
- print "1..0 # Skip: Perl 5.14.0 or later required\n";
- exit 0;
- }
-}
+BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-use Encode;
-use Test::More tests => 10;
+use Test::More;
+use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-my $valid = "\x61\x00\x00\x00";
-my $invalid = "\x78\x56\x34\x12";
+my $script = quotemeta $0;
-my @warnings;
-$SIG{__WARN__} = sub {push @warnings, "@_"};
+plan tests => 12;
-my $enc = find_encoding("UTF32-LE");
+my @invalid;
-{
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $valid );
- is("@warnings", "", "Calling decode in Encode::Unicode on valid string produces no warnings");
-}
+ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
+like $@, qr/^"\\x\{d800\}" does not map to UTF-8 at $script line /, 'Error message contains strict UTF-8 name';
+@invalid = ();
+encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
+ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
+ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-{
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- like("@warnings", qr/is not Unicode/, "Calling decode in Encode::Unicode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings 'utf8'");
-}
-
-{
- no warnings;
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings");
-}
-
-
-
-{
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- like("@warnings", qr/is not Unicode/, "Calling decode in Encode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
- no warnings;
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings");
-};
-
-
-
-{
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- like("@warnings", qr/is not Unicode/, "Calling from_to in Encode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
- no warnings;
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings");
-};
+ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
+like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
+decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
--
2.11.0
|
From @paliIn attachment is a v3 version of Encode patch. |
From @paliv3-0001-Encode-Add-new-check-flag-Encode-ONLY_PRAGMA_WARNING.patchFrom aa43c063128b889db7379435209219a850071d0f Mon Sep 17 00:00:00 2001
From: Pali <[email protected]>
Date: Sat, 1 Jul 2017 12:32:34 +0200
Subject: [PATCH v3] Encode: Add new check flag Encode::ONLY_PRAGMA_WARNINGS
When this new flag is set then only warnings configured and enabled by
pragma warnings are reported. It has no effect without setting check flag
Encode::ENCODE_WARN_ON_ERR.
---
Encode.pm | 131 +++++++------------------
Encode.xs | 13 ++-
Encode/encode.h | 4 +
Unicode/Unicode.xs | 148 +++++++++++++++++-----------
t/decode.t | 9 +-
t/enc_eucjp.t | 4 +
t/utf32warnings.t | 283 +++++++++++++++++++++++++++++++++++++++++++++++++++++
t/utf8messages.t | 33 -------
t/utf8warnings.t | 109 +++++----------------
9 files changed, 460 insertions(+), 274 deletions(-)
create mode 100644 t/utf32warnings.t
delete mode 100644 t/utf8messages.t
diff --git a/Encode.pm b/Encode.pm
index 9234ae0..ec0992d 100644
--- a/Encode.pm
+++ b/Encode.pm
@@ -14,6 +14,7 @@ BEGIN {
use Exporter 5.57 'import';
+use Carp ();
our @CARP_NOT = qw(Encode::Encoder);
# Public, encouraged API is exported by default
@@ -171,109 +172,54 @@ sub clone_encoding($) {
}
sub encode($$;$) {
- my ( $name, $string, $check ) = @_;
- return undef unless defined $string;
- $string .= ''; # stringify;
- $check ||= 0;
- unless ( defined $name ) {
- require Carp;
- Carp::croak("Encoding name should not be undef");
- }
+ my $name = $_[0];
+ my $check = $_[2];
+ Carp::croak("Encoding name should not be undef") unless defined $name;
my $enc = find_encoding($name);
- unless ( defined $enc ) {
- require Carp;
- Carp::croak("Unknown encoding '$name'");
- }
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $octets;
- if ( ref($enc) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $octets = $enc->encode( $string, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $octets = $enc->encode( $string, $check );
+ Carp::croak("Unknown encoding '$name'") unless defined $enc;
+ my $encode = $enc->can('encode');
+ Carp::croak("No function 'encode' for encoding '$name'") unless defined $encode;
+ $check ||= 0;
+ splice(@_, 0, 1, $enc);
+ if (ref $check or !$check or ($check & LEAVE_SRC)) {
+ my $string = $_[1];
+ splice(@_, 1, 1, $string);
}
- $_[1] = $string if $check and !ref $check and !( $check & LEAVE_SRC );
- return $octets;
+ splice(@_, 2, 1, $check);
+ goto &$encode;
}
*str2bytes = \&encode;
sub decode($$;$) {
- my ( $name, $octets, $check ) = @_;
- return undef unless defined $octets;
- $octets .= '';
- $check ||= 0;
+ my $name = $_[0];
+ my $check = $_[2];
+ Carp::croak("Encoding name should not be undef") unless defined $name;
my $enc = find_encoding($name);
- unless ( defined $enc ) {
- require Carp;
- Carp::croak("Unknown encoding '$name'");
- }
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $string;
- if ( ref($enc) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $string = $enc->decode( $octets, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $string = $enc->decode( $octets, $check );
+ Carp::croak("Unknown encoding '$name'") unless defined $enc;
+ my $decode = $enc->can('decode');
+ Carp::croak("No function 'decode' for encoding '$name'") unless defined $decode;
+ $check ||= 0;
+ splice(@_, 0, 1, $enc);
+ if (ref $check or !$check or ($check & LEAVE_SRC)) {
+ my $octets = $_[1];
+ splice(@_, 1, 1, $octets);
}
- $_[1] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
- return $string;
+ splice(@_, 2, 1, $check);
+ goto &$decode;
}
*bytes2str = \&decode;
sub from_to($$$;$) {
my ( $string, $from, $to, $check ) = @_;
- return undef unless defined $string;
- $check ||= 0;
+ Carp::croak("Encoding name should not be undef") unless defined $from and defined $to;
my $f = find_encoding($from);
- unless ( defined $f ) {
- require Carp;
- Carp::croak("Unknown encoding '$from'");
- }
+ Carp::croak("Unknown encoding '$from'") unless defined $f;
my $t = find_encoding($to);
- unless ( defined $t ) {
- require Carp;
- Carp::croak("Unknown encoding '$to'");
- }
-
- # For Unicode, warnings need to be caught and re-issued at this level
- # so that callers can disable utf8 warnings lexically.
- my $uni;
- if ( ref($f) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $uni = $f->decode($string);
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $uni = $f->decode($string);
- }
-
- if ( ref($t) eq 'Encode::Unicode' ) {
- my $warn = '';
- {
- local $SIG{__WARN__} = sub { $warn = shift };
- $_[0] = $string = $t->encode( $uni, $check );
- }
- warnings::warnif('utf8', $warn) if length $warn;
- }
- else {
- $_[0] = $string = $t->encode( $uni, $check );
- }
-
+ Carp::croak("Unknown encoding '$to'") unless defined $t;
+ return undef unless defined $string;
+ $check ||= 0;
+ my $uni = $f->decode($string);
+ $_[0] = $string = $t->encode( $uni, $check );
return undef if ( $check && length($uni) );
return defined( $_[0] ) ? length($string) : undef;
}
@@ -288,14 +234,9 @@ sub encode_utf8($) {
my $utf8enc;
sub decode_utf8($;$) {
- my ( $octets, $check ) = @_;
- return undef unless defined $octets;
- $octets .= '';
- $check ||= 0;
$utf8enc ||= find_encoding('utf8');
- my $string = $utf8enc->decode( $octets, $check );
- $_[0] = $octets if $check and !ref $check and !( $check & LEAVE_SRC );
- return $string;
+ unshift(@_, $utf8enc);
+ goto &{$utf8enc->can('decode')};
}
onBOOT;
diff --git a/Encode.xs b/Encode.xs
index 9796181..09dca71 100644
--- a/Encode.xs
+++ b/Encode.xs
@@ -237,7 +237,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
(UV)ch, enc->name[0]);
return &PL_sv_undef; /* never reaches but be safe */
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
Perl_warner(aTHX_ packWARN(WARN_UTF8),
ERR_ENCODE_NOMAP, (UV)ch, enc->name[0]);
}
@@ -276,7 +276,7 @@ encode_method(pTHX_ const encode_t * enc, const encpage_t * dir, SV * src, U8 *
enc->name[0], (UV)s[slen]);
return &PL_sv_undef; /* never reaches but be safe */
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
Perl_warner(
aTHX_ packWARN(WARN_UTF8),
ERR_DECODE_NOMAP,
@@ -460,7 +460,11 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
? UTF8_DISALLOW_ILLEGAL_INTERCHANGE
: UTF8_ALLOW_NON_STRICT;
- if (SvROK(check_sv)) {
+ if (!SvOK(check_sv)) {
+ fallback_cb = &PL_sv_undef;
+ check = 0;
+ }
+ else if (SvROK(check_sv)) {
/* croak("UTF-8 decoder doesn't support callback CHECK"); */
fallback_cb = check_sv;
check = ENCODE_PERLQQ|ENCODE_LEAVE_SRC; /* same as perlqq */
@@ -581,7 +585,7 @@ process_utf8(pTHX_ SV* dst, U8* s, U8* e, SV *check_sv,
else
Perl_croak(aTHX_ ERR_DECODE_STR_NOMAP, (strict ? "UTF-8" : "utf8"), esc);
}
- if (check & ENCODE_WARN_ON_ERR){
+ if (encode_ckWARN(check, WARN_UTF8)) {
if (encode)
Perl_warner(aTHX_ packWARN(WARN_UTF8),
ERR_ENCODE_NOMAP, uv, (strict ? "UTF-8" : "utf8"));
@@ -1029,6 +1033,7 @@ BOOT:
newCONSTSUB(stash, "WARN_ON_ERR", newSViv(ENCODE_WARN_ON_ERR));
newCONSTSUB(stash, "RETURN_ON_ERR", newSViv(ENCODE_RETURN_ON_ERR));
newCONSTSUB(stash, "LEAVE_SRC", newSViv(ENCODE_LEAVE_SRC));
+ newCONSTSUB(stash, "ONLY_PRAGMA_WARNINGS", newSViv(ENCODE_ONLY_PRAGMA_WARNINGS));
newCONSTSUB(stash, "PERLQQ", newSViv(ENCODE_PERLQQ));
newCONSTSUB(stash, "HTMLCREF", newSViv(ENCODE_HTMLCREF));
newCONSTSUB(stash, "XMLCREF", newSViv(ENCODE_XMLCREF));
diff --git a/Encode/encode.h b/Encode/encode.h
index df5554f..5189930 100644
--- a/Encode/encode.h
+++ b/Encode/encode.h
@@ -99,6 +99,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
#define ENCODE_WARN_ON_ERR 0x0002 /* warn on error; may proceed */
#define ENCODE_RETURN_ON_ERR 0x0004 /* immediately returns on NOREP */
#define ENCODE_LEAVE_SRC 0x0008 /* $src updated unless set */
+#define ENCODE_ONLY_PRAGMA_WARNINGS 0x0010 /* when enabled report only warnings configured by pragma warnings, otherwise report all warnings; no effect without ENCODE_WARN_ON_ERR */
#define ENCODE_PERLQQ 0x0100 /* perlqq fallback string */
#define ENCODE_HTMLCREF 0x0200 /* HTML character ref. fb mode */
#define ENCODE_XMLCREF 0x0400 /* XML character ref. fb mode */
@@ -112,4 +113,7 @@ extern void Encode_DefineEncoding(encode_t *enc);
#define ENCODE_FB_HTMLCREF (ENCODE_HTMLCREF|ENCODE_LEAVE_SRC)
#define ENCODE_FB_XMLCREF (ENCODE_XMLCREF|ENCODE_LEAVE_SRC)
+#define encode_ckWARN(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || ckWARN(w)))
+#define encode_ckWARN_packed(c, w) ((c & ENCODE_WARN_ON_ERR) && (!(c & ENCODE_ONLY_PRAGMA_WARNINGS) || Perl_ckwarn(aTHX_ w)))
+
#endif /* ENCODE_H */
diff --git a/Unicode/Unicode.xs b/Unicode/Unicode.xs
index ad88106..c9e4391 100644
--- a/Unicode/Unicode.xs
+++ b/Unicode/Unicode.xs
@@ -17,14 +17,8 @@
#define isLoSurrogate(x) (0xDC00 <= (x) && (x) <= 0xDFFF )
#define invalid_ucs2(x) ( issurrogate(x) || 0xFFFF < (x) )
-/* For pre-5.14 source compatibility */
-#ifndef UNICODE_WARN_ILLEGAL_INTERCHANGE
-# define UNICODE_WARN_ILLEGAL_INTERCHANGE 0
-# define UTF8_DISALLOW_SURROGATE 0
-# define UTF8_WARN_SURROGATE 0
-# define UTF8_DISALLOW_FE_FF 0
-# define UTF8_WARN_FE_FF 0
-# define UTF8_WARN_NONCHAR 0
+#ifndef SVfARG
+#define SVfARG(p) ((void*)(p))
#endif
#define PERLIO_BUFSIZ 1024 /* XXX value comes from PerlIOEncode_get_base */
@@ -133,6 +127,7 @@ SV * str
IV check
CODE:
{
+ SV *name = attr("Name");
SV *sve = attr("endian");
U8 endian = *((U8 *)SvPV_nolen(sve));
SV *svs = attr("size");
@@ -233,53 +228,61 @@ CODE:
ucs2 = SvTRUE(sv);
}
if (ucs2 || size == 4) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":no surrogates allowed %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
+ SVfARG(name), ord);
+ }
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":no surrogates allowed %" UVxf,
+ SVfARG(name), ord);
}
ord = FBCHAR;
}
else {
UV lo;
if (!isHiSurrogate(ord)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Malformed HI surrogate %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
+ SVfARG(name), ord);
}
- else {
- ord = FBCHAR;
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed HI surrogate %" UVxf,
+ SVfARG(name), ord);
}
+ ord = FBCHAR;
}
else if (s+size > e) {
- if (check) {
- if (check & ENCODE_STOP_AT_PARTIAL) {
- s -= size;
- break;
- }
- else {
- croak("%" SVf ":Malformed HI surrogate %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
- }
+ if (check & ENCODE_STOP_AT_PARTIAL) {
+ s -= size;
+ break;
}
- else {
- ord = FBCHAR;
+ if (check & ENCODE_DIE_ON_ERR) {
+ croak("%" SVf ":Malformed HI surrogate %" UVxf,
+ SVfARG(name), ord);
+ }
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed HI surrogate %" UVxf,
+ SVfARG(name), ord);
}
+ ord = FBCHAR;
}
else {
lo = enc_unpack(aTHX_ &s,e,size,endian);
if (!isLoSurrogate(lo)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Malformed LO surrogate %" UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
+ SVfARG(name), ord);
}
- else {
- s -= size;
- ord = FBCHAR;
+ if (encode_ckWARN(check, WARN_SURROGATE)) {
+ warner(packWARN(WARN_SURROGATE),
+ "%" SVf ":Malformed LO surrogate %" UVxf,
+ SVfARG(name), ord);
}
+ s -= size;
+ ord = FBCHAR;
}
else {
ord = 0x10000 + ((ord - 0xD800) << 10) + (lo - 0xDC00);
@@ -289,13 +292,16 @@ CODE:
}
if ((ord & 0xFFFE) == 0xFFFE || (ord >= 0xFDD0 && ord <= 0xFDEF)) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":Unicode character %" UVxf " is illegal",
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- ord);
- } else {
- ord = FBCHAR;
+ SVfARG(name), ord);
}
+ if (encode_ckWARN(check, WARN_NONCHAR)) {
+ warner(packWARN(WARN_NONCHAR),
+ "%" SVf ":Unicode character %" UVxf " is illegal",
+ SVfARG(name), ord);
+ }
+ ord = FBCHAR;
}
if (resultbuflen < SvCUR(result) + UTF8_MAXLEN + 1) {
@@ -315,16 +321,28 @@ CODE:
resultbuflen = SvLEN(result);
}
- d = uvchr_to_utf8_flags(resultbuf+SvCUR(result), ord,
- UNICODE_WARN_ILLEGAL_INTERCHANGE);
+ HV *hv = NULL;
+ d = uvchr_to_utf8_flags_msgs(resultbuf+SvCUR(result), ord, UNICODE_DISALLOW_ILLEGAL_INTERCHANGE | UNICODE_WARN_ILLEGAL_INTERCHANGE, &hv);
+ if (hv) {
+ sv_2mortal((SV *)hv);
+ SV *message = *hv_fetch(hv, "text", 4, 0);
+ U32 categories = SvUVx(*hv_fetch(hv, "warn_categories", 15, 0));
+ if (check & ENCODE_DIE_ON_ERR)
+ croak("%" SVf, SVfARG(message));
+ if (encode_ckWARN_packed(check, categories))
+ warner(categories, "%" SVf, SVfARG(message));
+ d = uvchr_to_utf8_flags(resultbuf+SvCUR(result), FBCHAR, 0);
+ }
+
SvCUR_set(result, d - (U8 *)SvPVX(result));
}
if (s < e) {
/* unlikely to happen because it's fixed-length -- dankogai */
- if (check & ENCODE_WARN_ON_ERR) {
- Perl_warner(aTHX_ packWARN(WARN_UTF8),"%" SVf ":Partial character",
- *hv_fetch((HV *)SvRV(obj),"Name",4,0));
+ if (check & ENCODE_DIE_ON_ERR)
+ croak("%" SVf ":Partial character", SVfARG(name));
+ if (encode_ckWARN(check, WARN_UTF8)) {
+ warner(packWARN(WARN_UTF8),"%" SVf ":Partial character", SVfARG(name));
}
}
if (check && !(check & ENCODE_LEAVE_SRC)) {
@@ -351,6 +369,7 @@ SV * utf8
IV check
CODE:
{
+ SV *name = attr("Name");
SV *sve = attr("endian");
U8 endian = *((U8 *)SvPV_nolen(sve));
SV *svs = attr("size");
@@ -406,25 +425,43 @@ CODE:
#endif
}
while (s < e && s+UTF8SKIP(s) <= e) {
- STRLEN len;
- UV ord = utf8n_to_uvchr(s, e-s, &len, (UTF8_DISALLOW_SURROGATE
- |UTF8_WARN_SURROGATE
- |UTF8_DISALLOW_FE_FF
- |UTF8_WARN_FE_FF
- |UTF8_WARN_NONCHAR));
- s += len;
- if (size != 4 && invalid_ucs2(ord)) {
+ STRLEN len;
+ AV *msgs = NULL;
+ UV ord = utf8n_to_uvchr_msgs(s, e-s, &len, UTF8_DISALLOW_ILLEGAL_INTERCHANGE | UTF8_WARN_ILLEGAL_INTERCHANGE, NULL, &msgs);
+ if (msgs) {
+ SSize_t i;
+ SSize_t len = av_len(msgs)+1;
+ sv_2mortal((SV *)msgs);
+ for (i = 0; i < len; ++i) {
+ SV *sv = *av_fetch(msgs, i, 0);
+ HV *hv = (HV *)SvRV(sv);
+ SV *message = *hv_fetch(hv, "text", 4, 0);
+ U32 categories = SvUVx(*hv_fetch(hv, "warn_categories", 15, 0));
+ if (check & ENCODE_DIE_ON_ERR)
+ croak("%" SVf, SVfARG(message));
+ if (encode_ckWARN_packed(check, categories))
+ warner(categories, "%" SVf, SVfARG(message));
+ }
+ }
+ if ((size != 4 && invalid_ucs2(ord)) || (ord == 0 && *s != 0)) {
if (!issurrogate(ord)) {
if (ucs2 == -1) {
SV *sv = attr("ucs2");
ucs2 = SvTRUE(sv);
}
if (ucs2 || ord > 0x10FFFF) {
- if (check) {
+ if (check & ENCODE_DIE_ON_ERR) {
croak("%" SVf ":code point \"\\x{%" UVxf "}\" too high",
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),ord);
+ SVfARG(name),ord);
+ }
+ if (encode_ckWARN(check, WARN_NON_UNICODE)) {
+ warner(packWARN(WARN_NON_UNICODE),
+ "%" SVf ":code point \"\\x{%" UVxf "}\" too high",
+ SVfARG(name),ord);
}
enc_pack(aTHX_ result,size,endian,FBCHAR);
+ } else if (ord == 0) {
+ enc_pack(aTHX_ result,size,endian,FBCHAR);
} else {
UV hi = ((ord - 0x10000) >> 10) + 0xD800;
UV lo = ((ord - 0x10000) & 0x3FF) + 0xDC00;
@@ -440,6 +477,7 @@ CODE:
else {
enc_pack(aTHX_ result,size,endian,ord);
}
+ s += len;
}
if (s < e) {
/* UTF-8 partial char happens often on PerlIO.
@@ -449,7 +487,7 @@ CODE:
if (check & (ENCODE_DIE_ON_ERR|ENCODE_LEAVE_SRC)) {
Perl_croak(aTHX_ "%" SVf ":partial character is not allowed "
"when CHECK = 0x%" UVuf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0), check);
+ SVfARG(name), check);
}
}
if (check && !(check & ENCODE_LEAVE_SRC)) {
diff --git a/t/decode.t b/t/decode.t
index 93c992c..0c3b669 100644
--- a/t/decode.t
+++ b/t/decode.t
@@ -51,9 +51,12 @@ $orig = "\xC3\x80";
$orig =~ /(..)/;
is(Encode::decode_utf8($1), "\N{U+C0}", 'passing magic regex to Encode::decode_utf8');
-$orig = "\xC3\x80";
-*a = $orig;
-is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+SKIP: {
+ skip "Perl Version ($]) is older than v5.27.1", 1 if $] < 5.027001;
+ $orig = "\xC3\x80";
+ *a = $orig;
+ is(Encode::decode_utf8(*a), "*main::\N{U+C0}", 'passing typeglob to Encode::decode_utf8');
+}
$orig = "\N{U+C0}";
$orig =~ /(.)/;
diff --git a/t/enc_eucjp.t b/t/enc_eucjp.t
index fc0af3c..8f933b0 100644
--- a/t/enc_eucjp.t
+++ b/t/enc_eucjp.t
@@ -25,6 +25,10 @@ BEGIN {
}
}
+use Encode qw();
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+use warnings "utf8";
+
no warnings "deprecated";
use encoding 'euc-jp';
diff --git a/t/utf32warnings.t b/t/utf32warnings.t
new file mode 100644
index 0000000..d84939f
--- /dev/null
+++ b/t/utf32warnings.t
@@ -0,0 +1,283 @@
+use strict;
+use warnings;
+BEGIN {
+ if ($] < 5.014){
+ print "1..0 # Skip: Perl 5.14.0 or later required\n";
+ exit 0;
+ }
+}
+
+my $script = quotemeta $0;
+
+use Encode;
+use Test::More tests => 38;
+
+my $valid = "\x61\x00\x00\x00";
+my $invalid = "\x78\x56\x34\x12";
+
+our $warn;
+$SIG{__WARN__} = sub { $warn = $_[0] };
+
+my $enc = find_encoding("UTF32-LE");
+
+{
+ local $warn;
+ my $ret = $enc->encode( "a", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Calling encode on UTF32-LE encode object with valid string produces no warnings");
+ is($ret, $valid, "Calling encode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling encode on UTF32-LE encode object with invalid string warns");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from encode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from encode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from encode method of UTF32-LE encode object cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+ local $warn;
+ no warnings;
+ $enc->encode( "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from encode method of UTF32-LE encode object cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+ local $warn;
+ my $ret = Encode::encode( $enc, "a", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Calling Encode::encode for UTF32-LE with valid string produces no warnings");
+ is($ret, $valid, "Calling Encode::encode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling Encode::encode for UTF32-LE with invalid string warns");
+}
+
+
+{
+ local $warn;
+ no warnings 'utf8';
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from Encode::encode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from Encode::encode for UTF32-LE can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from Encode::encode for UTF32-LE cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+ local $warn;
+ no warnings;
+ Encode::encode( $enc, "\x{D800}", Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Warning from Encode::encode for UTF32-LE cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+ local $warn;
+ my $ret = $enc->decode( $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Calling decode on UTF32-LE encode object with valid string produces no warnings");
+ is($ret, "a", "Calling decode on UTF32-LE encode object with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Calling decode on UTF32-LE encode object with invalid string warns");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from decode method of UTF32-LE encode object can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Warning from decode method of UTF32-LE encode object cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+ local $warn;
+ no warnings;
+ $enc->decode( $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Warning from decode method of UTF32-LE encode object cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+{
+ local $warn;
+ my $ret = Encode::decode( $enc, $valid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Calling Encode::decode for UTF32-LE with valid string produces no warnings");
+ is($ret, "a", "Calling Encode::decode for UTF32-LE with valid string returns correct output");
+}
+
+
+{
+ local $warn;
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Calling Encode::decode for UTF32-LE with invalid string warns");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ no warnings;
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::ONLY_PRAGMA_WARNINGS | Encode::LEAVE_SRC );
+ is($warn, undef, "Warning from Encode::decode for UTF32-LE can be silenced via no warnings");
+}
+
+{
+ local $warn;
+ no warnings 'utf8';
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Warning from Encode::decode for UTF32-LE cannot be silenced via no warnings 'utf8' when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+{
+ local $warn;
+ no warnings;
+ Encode::decode( $enc, $invalid, Encode::WARN_ON_ERR | Encode::LEAVE_SRC );
+ like($warn, qr/may not be portable.* at $script line /, "Warning from Encode::decode for UTF32-LE cannot be silenced via no warnings when ONLY_PRAGMA_WARNINGS is not used");
+}
+
+
+use PerlIO::encoding;
+$PerlIO::encoding::fallback |= Encode::ONLY_PRAGMA_WARNINGS;
+
+{
+ local $warn;
+ my $tmp = $valid;
+ $tmp .= ''; # de-COW
+ open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+ my $str = <$fh>;
+ close $fh;
+ is($warn, undef, "Calling PerlIO :encoding on valid string produces no warnings");
+ is($str, "a", "PerlIO decodes string correctly");
+}
+
+
+{
+ local $warn;
+ my $tmp = $invalid;
+ use Devel::Peek;
+ $tmp .= ''; # de-COW
+ open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+ my $str = <$fh>;
+ close $fh;
+ like($warn, qr/may not be portable.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+ local $warn;
+ my $tmp = $invalid;
+ $tmp .= ''; # de-COW
+ no warnings 'utf8';
+ open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+ my $str = <$fh>;
+ close $fh;
+ is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ my $tmp = $invalid;
+ $tmp .= ''; # de-COW
+ no warnings;
+ open my $fh, '<:encoding(UTF32-LE)', \$tmp or die;
+ my $str = <$fh>;
+ close $fh;
+ is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
+
+
+{
+ local $warn;
+ my $str;
+ open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+ print $fh "a";
+ close $fh;
+ is($warn, undef, "Calling PerlIO :encoding on valid string produces no warnings");
+ is($str, $valid, "PerlIO encodes string correctly");
+}
+
+
+{
+ local $warn;
+ my $str;
+ open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+ print $fh "\x{D800}";
+ close $fh;
+ like($warn, qr/UTF-16 surrogate.* at $script line /, "Calling PerlIO :encoding on invalid string warns");
+}
+
+{
+ local $warn;
+ my $str;
+ no warnings 'utf8';
+ open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+ print $fh "\x{D800}";
+ close $fh;
+ is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings 'utf8'");
+}
+
+{
+ local $warn;
+ my $str;
+ no warnings;
+ open my $fh, '>:encoding(UTF32-LE)', \$str or die;
+ print $fh "\x{D800}";
+ close $fh;
+ is($warn, undef, "Warning from PerlIO :encoding can be silenced via no warnings");
+}
diff --git a/t/utf8messages.t b/t/utf8messages.t
deleted file mode 100644
index 8b6b379..0000000
--- a/t/utf8messages.t
+++ /dev/null
@@ -1,33 +0,0 @@
-use strict;
-use warnings;
-BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-
-use Test::More;
-use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-
-plan tests => 12;
-
-my @invalid;
-
-ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
-like $@, qr/^"\\x\{d800\}" does not map to UTF-8 /, 'Error message contains strict UTF-8 name';
-@invalid = ();
-encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
-
-ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
-like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
-@invalid = ();
-decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-
-ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
-like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
-decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
-is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
diff --git a/t/utf8warnings.t b/t/utf8warnings.t
index 0d1ac6d..e4e4304 100644
--- a/t/utf8warnings.t
+++ b/t/utf8warnings.t
@@ -1,94 +1,35 @@
use strict;
use warnings;
-BEGIN {
- if ($] < 5.014){
- print "1..0 # Skip: Perl 5.14.0 or later required\n";
- exit 0;
- }
-}
+BEGIN { 'warnings'->unimport('utf8') if $] < 5.014 }; # turn off 'UTF-16 surrogate 0xd800' warnings
-use Encode;
-use Test::More tests => 10;
+use Test::More;
+use Encode qw(encode decode FB_CROAK LEAVE_SRC);
-my $valid = "\x61\x00\x00\x00";
-my $invalid = "\x78\x56\x34\x12";
+my $script = quotemeta $0;
-my @warnings;
-$SIG{__WARN__} = sub {push @warnings, "@_"};
+plan tests => 12;
-my $enc = find_encoding("UTF32-LE");
+my @invalid;
-{
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $valid );
- is("@warnings", "", "Calling decode in Encode::Unicode on valid string produces no warnings");
-}
+ok ! defined eval { encode('UTF-8', "\x{D800}", FB_CROAK | LEAVE_SRC) }, 'Surrogate codepoint \x{D800} is not encoded to strict UTF-8';
+like $@, qr/^"\\x\{d800\}" does not map to UTF-8 at $script line /, 'Error message contains strict UTF-8 name';
+@invalid = ();
+encode('UTF-8', "\x{D800}", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xD800 ], 'Fallback coderef contains invalid codepoint 0xD800';
+ok ! defined eval { decode('UTF-8', "\xed\xa0\x80", FB_CROAK | LEAVE_SRC) }, 'Surrogate UTF-8 byte sequence \xED\xA0\x80 is decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0\\x80" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0\x80", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0, 0x80 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0, 0x80';
+ok ! defined eval { decode('UTF-8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with strict UTF-8 decoder';
+like $@, qr/^UTF-8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains strict UTF-8 name and original (not decoded) invalid sequence';
+@invalid = ();
+decode('UTF-8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
-{
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- like("@warnings", qr/is not Unicode/, "Calling decode in Encode::Unicode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings 'utf8'");
-}
-
-{
- no warnings;
- @warnings = ();
- my $ret = Encode::Unicode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode::Unicode can be silenced via no warnings");
-}
-
-
-
-{
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- like("@warnings", qr/is not Unicode/, "Calling decode in Encode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
- no warnings;
- @warnings = ();
- my $ret = Encode::decode( $enc, $invalid );
- is("@warnings", "", "Warning from decode in Encode can be silenced via no warnings");
-};
-
-
-
-{
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- like("@warnings", qr/is not Unicode/, "Calling from_to in Encode on invalid string warns");
-}
-
-{
- no warnings 'utf8';
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings 'utf8'");
-};
-
-{
- no warnings;
- @warnings = ();
- my $inplace = $invalid;
- Encode::from_to( $inplace, "UTF32-LE", "UTF-8" );
- is("@warnings", "", "Warning from from_to in Encode can be silenced via no warnings");
-};
+ok ! defined eval { decode('utf8', "\xed\xa0", FB_CROAK | LEAVE_SRC) }, 'Invalid byte sequence \xED\xA0 is not decoded with non-strict utf8 decoder';
+like $@, qr/^utf8 "\\xED\\xA0" does not map to Unicode at $script line /, 'Error message contains non-strict utf8 name and original (not decoded) invalid sequence';
+decode('utf8', "\xed\xa0", sub { @invalid = @_; return ""; });
+is_deeply \@invalid, [ 0xED, 0xA0 ], 'Fallback coderef contains invalid byte sequence 0xED, 0xA0';
--
2.11.0
|
From @paliEncode part was sent and now merged into upstream Encode module: PerlIO::encoding is remaining part for core. |
From @paliOn Saturday 30 June 2018 11:23:40 pali@cpan.org wrote:
Changes in these pull requests were now released on CPAN in Encode
Can you update Encode module in Perl core and apply remaining |
From @jkeenanOn Mon, 21 Jan 2019 08:48:26 GMT, pali@cpan.org wrote:
Done.
Can you specify which of the patches attached to this RT is the one still under consideration? Thank you very much. -- |
From @paliOn Monday 21 January 2019 09:48:27 James E Keenan via RT wrote:
Thanks!
0002-PerlIO-encoding-Use-Encode-ONLY_PRAGMA_WARNINGS-in-f.patch All other patches are part of Encode, and therefore applied. |
From @LeontOn Mon, Jan 21, 2019 at 6:51 PM <pali@cpan.org> wrote:
Can you add a perldelta entry? Leon |
From @paliOn Monday 21 January 2019 15:03:02 Leon Timmermans via RT wrote:
I do not know where is correct place to put entries, nor what is correct ===
|
From @paliOn Tuesday 22 January 2019 09:38:38 pali@cpan.org wrote:
Leon, it is enough? |
From @paliOn Thursday 07 February 2019 16:05:40 pali@cpan.org wrote:
PING |
From @tonycozOn Tue, 12 Feb 2019 07:46:22 -0800, pali@cpan.org wrote:
Do you mean the patch: - Encode::PERLQQ()|Encode::WARN_ON_ERR()|Encode::STOP_AT_PARTIAL(); which fails a test: ../ext/PerlIO-encoding/t/fallback.t (Wstat: 256 Tests: 10 Failed: 1) Tony |
From @paliOn Tuesday 12 February 2019 20:54:11 Tony Cook via RT wrote:
Yes.
It started failing? Ah :-( IIRC it worked when I created it year ago. I will look at it... |
From @paliOn Wednesday 13 February 2019 09:36:31 pali@cpan.org wrote:
Hi! Following patch should fix this problem. There is missing Inline Patchdiff --git a/ext/PerlIO-encoding/t/fallback.t b/ext/PerlIO-encoding/t/fallback.t
index 3abdfd3f37..84ba097e71 100644
--- a/ext/PerlIO-encoding/t/fallback.t
+++ b/ext/PerlIO-encoding/t/fallback.t
@@ -16,11 +16,13 @@ BEGIN {
import Encode qw(:fallback_all);
}
+use warnings;
use Test::More tests => 10;
# $PerlIO::encoding = 0; # WARN_ON_ERR|PERLQQ;
my $file = "fallback$$.txt";
+my $line;
{
my $message = '';
@@ -34,7 +36,7 @@ my $file = "fallback$$.txt";
}
open($fh,'<',$file) || die "File cannot be re-opened";
-my $line = <$fh>;
+$line = <$fh>;
is($line,"\\x{20ac}0.02\n","perlqq escapes");
close($fh);
@@ -46,7 +48,7 @@ print $fh $str,"0.02\n";
close($fh);
open($fh,'<',$file) || die "File cannot be re-opened";
-my $line = <$fh>;
+$line = <$fh>;
is($line,"€0.02\n","HTML escapes");
close($fh);
@@ -59,7 +61,7 @@ close($fh);
}
ok(open($fh,"<encoding(US-ASCII)",$file),"Opened as ASCII");
-my $line = <$fh>;
+$line = <$fh>;
printf "# %x\n",ord($line);
is($line,"\\xA30.02\n","Escaped non-mapped char");
close($fh); |
From @tonycozOn Thu, 14 Feb 2019 01:08:33 -0800, pali@cpan.org wrote:
I just added use warnings to the block that does the warning test. Thanks, applied your change as 7d0a46b. Tony |
@tonycoz - Status changed from 'open' to 'pending release' |
From @khwilliamsonThank you for filing this report. You have helped make Perl better. With the release today of Perl 5.30.0, this and 160 other issues have been Perl 5.30.0 may be downloaded via: If you find that the problem persists, feel free to reopen this ticket. |
@khwilliamson - Status changed from 'pending release' to 'resolved' |
Migrated from rt.perl.org#131683 (status was 'resolved')
Searchable as RT131683$
The text was updated successfully, but these errors were encountered: