Michel Suignard
2004-09-13 23:43:35 UTC
By excluding table C.2.1 from the StringPrep profile used by IDNA, the
ToASCII operation allows all C0 control codes and 7F in its default mode
(UseSTD3ASCIIRules flag unset). This is rather troublesome as these
control codes, especially the 00 value, may create all sorts of issues
for run time libraries that use zero as string terminator on input. When
such usage is performed it makes quite cumbersome to detect when the
zero character is a terminator or part of the character string itself.
I understand the value of allowing all ASCII non control characters but
allowing by default the control characters in a ToASCII function seems
to open the door for all sorts of abuse and security risks.
Would a library that by default only allow the range 20-7E be still
considered conformant? In all cases, it would still honor the
UseSTD3ASCIIRules. Allowing the C0 control codes and 7F doesn't seem
that useful. I would have preferred to have a default mode excluding
them and if the full 00-7F is really required make it another optional
flag.
Was this intended? Or is it an error? Or maybe I am not reading the spec
correctly ;-)
Michel
----------
References:
RFC 3490 IDNA, section 2. Terminology says:
<<
An "internationalized label" is a label to which the ToASCII operation
(see instruction 4) can be applied without failing (with the
UseSTD3ASCIIRules flag unset).
>>
section 4.1 ToASCII says:
<<
2. Perform the steps specified in [NAMEPREP] and fail if there is an
error. The AllowUnassigned flag is used in [NAMEPREP].
3. If the UseSTD3ASCIIRules flag is set, then perform these checks:
(a) Verify the absence of non-LDH ASCII code points; that is, the
absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
>>
RFC 3491 NAMEPREP section 5. Prohibited output says:
<<
This profile specifies prohibiting using the following tables from
[STRINGPREP]:
Table C.1.2
Table C.2.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9
>>
ToASCII operation allows all C0 control codes and 7F in its default mode
(UseSTD3ASCIIRules flag unset). This is rather troublesome as these
control codes, especially the 00 value, may create all sorts of issues
for run time libraries that use zero as string terminator on input. When
such usage is performed it makes quite cumbersome to detect when the
zero character is a terminator or part of the character string itself.
I understand the value of allowing all ASCII non control characters but
allowing by default the control characters in a ToASCII function seems
to open the door for all sorts of abuse and security risks.
Would a library that by default only allow the range 20-7E be still
considered conformant? In all cases, it would still honor the
UseSTD3ASCIIRules. Allowing the C0 control codes and 7F doesn't seem
that useful. I would have preferred to have a default mode excluding
them and if the full 00-7F is really required make it another optional
flag.
Was this intended? Or is it an error? Or maybe I am not reading the spec
correctly ;-)
Michel
----------
References:
RFC 3490 IDNA, section 2. Terminology says:
<<
An "internationalized label" is a label to which the ToASCII operation
(see instruction 4) can be applied without failing (with the
UseSTD3ASCIIRules flag unset).
>>
section 4.1 ToASCII says:
<<
2. Perform the steps specified in [NAMEPREP] and fail if there is an
error. The AllowUnassigned flag is used in [NAMEPREP].
3. If the UseSTD3ASCIIRules flag is set, then perform these checks:
(a) Verify the absence of non-LDH ASCII code points; that is, the
absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
>>
RFC 3491 NAMEPREP section 5. Prohibited output says:
<<
This profile specifies prohibiting using the following tables from
[STRINGPREP]:
Table C.1.2
Table C.2.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9
>>