IDNA and Control codes (0-1F) and 7F

Discussion:

Michel Suignard

2004-09-13 23:43:35 UTC

By excluding table C.2.1 from the StringPrep profile used by IDNA, the
ToASCII operation allows all C0 control codes and 7F in its default mode
(UseSTD3ASCIIRules flag unset). This is rather troublesome as these
control codes, especially the 00 value, may create all sorts of issues
for run time libraries that use zero as string terminator on input. When
such usage is performed it makes quite cumbersome to detect when the
zero character is a terminator or part of the character string itself.
I understand the value of allowing all ASCII non control characters but
allowing by default the control characters in a ToASCII function seems
to open the door for all sorts of abuse and security risks.

Would a library that by default only allow the range 20-7E be still
considered conformant? In all cases, it would still honor the
UseSTD3ASCIIRules. Allowing the C0 control codes and 7F doesn't seem
that useful. I would have preferred to have a default mode excluding
them and if the full 00-7F is really required make it another optional
flag.

Was this intended? Or is it an error? Or maybe I am not reading the spec
correctly ;-)

Michel
----------
References:
RFC 3490 IDNA, section 2. Terminology says:
<<
An "internationalized label" is a label to which the ToASCII operation
(see instruction 4) can be applied without failing (with the
UseSTD3ASCIIRules flag unset).
>>

section 4.1 ToASCII says:
<<
2. Perform the steps specified in [NAMEPREP] and fail if there is an
error. The AllowUnassigned flag is used in [NAMEPREP].
3. If the UseSTD3ASCIIRules flag is set, then perform these checks:

(a) Verify the absence of non-LDH ASCII code points; that is, the
absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
>>

RFC 3491 NAMEPREP section 5. Prohibited output says:
<<
This profile specifies prohibiting using the following tables from
[STRINGPREP]:

Table C.1.2
Table C.2.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9
>>

Paul Hoffman / IMC

2004-09-14 00:33:52 UTC

Permalink

At 4:43 PM -0700 9/13/04, Michel Suignard wrote:
>By excluding table C.2.1 from the StringPrep profile used by IDNA, the
>ToASCII operation allows all C0 control codes and 7F in its default mode
>(UseSTD3ASCIIRules flag unset).

There is no "default mode" for IDNA. This was debated, but in the end
we had to allow both methods of processing names. If anything, the
UseSTD3ASCIIRules flag being *set* is probably the default for most
systems processing domain names today, probably unconsciously.

>Would a library that by default only allow the range 20-7E be still
>considered conformant? In all cases, it would still honor the
>UseSTD3ASCIIRules. Allowing the C0 control codes and 7F doesn't seem
>that useful. I would have preferred to have a default mode excluding
>them and if the full 00-7F is really required make it another optional
>flag.

If such a library followed the IDNA spec with the UseSTD3ASCIIRules
being set, it would certainly be conformant.

--Paul Hoffman, Director
--Internet Mail Consortium

Adam M. Costello

2004-09-14 06:57:33 UTC

Permalink

Michel Suignard <***@windows.microsoft.com> wrote:

> By excluding table C.2.1 from the StringPrep profile used by IDNA, the
> ToASCII operation allows all C0 control codes and 7F in its default
> mode (UseSTD3ASCIIRules flag unset).

As Paul said, there is no default mode. It is up to the application
to decide whether to set UseSTD3ASCIIRules. Of course a library could
make one mode or the other the default, and it is free to choose either
mode to be the default. A library could also, I suppose, implement only
one mode or the other, in which case I guess it would be an incomplete
conformant implementation of ToASCII.

> This is rather troublesome as these control codes, especially the 00
> value, may create all sorts of issues for run time libraries that use
> zero as string terminator on input.

If the programming environment customarily uses a string representation
that does not allow embedded NULs to be represented, then it will be a
moot point whether your ToASCII implementation handles NUL correctly,
because it cannot be tested anyway. You can reasonably claim that it's
not your IDN library that's incomplete, but the programming environment
that's incomplete.

Note that ToASCII and ToUnicode will never try to output an embedded NUL
character if they never receive an embedded NUL character as input.

For an example of a C library that handles embedded NULs, see GNU
libidn:

http://www.gnu.org/software/libidn/

> I understand the value of allowing all ASCII non control characters
> but allowing by default the control characters in a ToASCII function
> seems to open the door for all sorts of abuse and security risks.

ToASCII and ToUnicode never introduce control characters; they output
only those control characters that were already present in the input.
If you consider control characters to be dangerous, then I would think
you'd want to reject them as early as possible, before you even get to
calling ToASCII or ToUnicode. One you have control-code-free strings,
ToASCII and ToUnicode will preserve that property.

> Would a library that by default only allow the range 20-7E be still
> considered conformant?

If you want to claim to have a complete implementation of ToASCII, then
I think it needs to be possible to pass control characters through.
But it needn't be the default mode. If you want to add your own
AllowControlChars flag that is unset by default, I see no problem with
that. The spec standardizes the function (input --> output), but not
the interface.

AMC

Krall, Gary

2004-09-14 20:30:24 UTC

Permalink

All:

Just as an fyi, the C library in the Verisign SDK ("Xcode") has the
UseSTD3ASCIIRules rule set by default. It is a compile switch option which
can be changed in the library's configuration file.

Also note that the library does not accept null characters embedded in input
strings regardless of how this flag is set. Control characters other than
null will pass through as expected without the std3 flag set.

Gary.

-----Original Message-----
From: Adam M. Costello [mailto:idn.amc+***@nicemice.net.RemoveThisWord]
Sent: Monday, September 13, 2004 11:58 PM
To: IETF idn working group
Subject: Re: [idn] IDNA and Control codes (0-1F) and 7F

Michel Suignard <***@windows.microsoft.com> wrote:

> By excluding table C.2.1 from the StringPrep profile used by IDNA, the
> ToASCII operation allows all C0 control codes and 7F in its default
> mode (UseSTD3ASCIIRules flag unset).

As Paul said, there is no default mode. It is up to the application
to decide whether to set UseSTD3ASCIIRules. Of course a library could
make one mode or the other the default, and it is free to choose either
mode to be the default. A library could also, I suppose, implement only
one mode or the other, in which case I guess it would be an incomplete
conformant implementation of ToASCII.

> This is rather troublesome as these control codes, especially the 00
> value, may create all sorts of issues for run time libraries that use
> zero as string terminator on input.

If the programming environment customarily uses a string representation
that does not allow embedded NULs to be represented, then it will be a
moot point whether your ToASCII implementation handles NUL correctly,
because it cannot be tested anyway. You can reasonably claim that it's
not your IDN library that's incomplete, but the programming environment
that's incomplete.

Note that ToASCII and ToUnicode will never try to output an embedded NUL
character if they never receive an embedded NUL character as input.

For an example of a C library that handles embedded NULs, see GNU
libidn:

http://www.gnu.org/software/libidn/

> I understand the value of allowing all ASCII non control characters
> but allowing by default the control characters in a ToASCII function
> seems to open the door for all sorts of abuse and security risks.

ToASCII and ToUnicode never introduce control characters; they output
only those control characters that were already present in the input.
If you consider control characters to be dangerous, then I would think
you'd want to reject them as early as possible, before you even get to
calling ToASCII or ToUnicode. One you have control-code-free strings,
ToASCII and ToUnicode will preserve that property.

> Would a library that by default only allow the range 20-7E be still
> considered conformant?

If you want to claim to have a complete implementation of ToASCII, then
I think it needs to be possible to pass control characters through.
But it needn't be the default mode. If you want to add your own
AllowControlChars flag that is unset by default, I see no problem with
that. The spec standardizes the function (input --> output), but not
the interface.

AMC

JFC (Jefsey) Morfin

2004-09-14 22:31:32 UTC

Permalink

Dear Gary,
were can this DSK be found. Or is it proprietary?
thanks
jfc morfin

At 22:30 14/09/2004, Krall, Gary wrote:

>All:
>
>Just as an fyi, the C library in the Verisign SDK ("Xcode") has the
>UseSTD3ASCIIRules rule set by default. It is a compile switch option which
>can be changed in the library's configuration file.
>
>Also note that the library does not accept null characters embedded in input
>strings regardless of how this flag is set. Control characters other than
>null will pass through as expected without the std3 flag set.
>
>Gary.
>
>-----Original Message-----
>From: Adam M. Costello [mailto:idn.amc+***@nicemice.net.RemoveThisWord]
>Sent: Monday, September 13, 2004 11:58 PM
>To: IETF idn working group
>Subject: Re: [idn] IDNA and Control codes (0-1F) and 7F
>
>
>Michel Suignard <***@windows.microsoft.com> wrote:
>
> > By excluding table C.2.1 from the StringPrep profile used by IDNA, the
> > ToASCII operation allows all C0 control codes and 7F in its default
> > mode (UseSTD3ASCIIRules flag unset).
>
>As Paul said, there is no default mode. It is up to the application
>to decide whether to set UseSTD3ASCIIRules. Of course a library could
>make one mode or the other the default, and it is free to choose either
>mode to be the default. A library could also, I suppose, implement only
>one mode or the other, in which case I guess it would be an incomplete
>conformant implementation of ToASCII.
>
> > This is rather troublesome as these control codes, especially the 00
> > value, may create all sorts of issues for run time libraries that use
> > zero as string terminator on input.
>
>If the programming environment customarily uses a string representation
>that does not allow embedded NULs to be represented, then it will be a
>moot point whether your ToASCII implementation handles NUL correctly,
>because it cannot be tested anyway. You can reasonably claim that it's
>not your IDN library that's incomplete, but the programming environment
>that's incomplete.
>
>Note that ToASCII and ToUnicode will never try to output an embedded NUL
>character if they never receive an embedded NUL character as input.
>
>For an example of a C library that handles embedded NULs, see GNU
>libidn:
>
>http://www.gnu.org/software/libidn/
>
> > I understand the value of allowing all ASCII non control characters
> > but allowing by default the control characters in a ToASCII function
> > seems to open the door for all sorts of abuse and security risks.
>
>ToASCII and ToUnicode never introduce control characters; they output
>only those control characters that were already present in the input.
>If you consider control characters to be dangerous, then I would think
>you'd want to reject them as early as possible, before you even get to
>calling ToASCII or ToUnicode. One you have control-code-free strings,
>ToASCII and ToUnicode will preserve that property.
>
> > Would a library that by default only allow the range 20-7E be still
> > considered conformant?
>
>If you want to claim to have a complete implementation of ToASCII, then
>I think it needs to be possible to pass control characters through.
>But it needn't be the default mode. If you want to add your own
>AllowControlChars flag that is unset by default, I see no problem with
>that. The spec standardizes the function (input --> output), but not
>the interface.
>
>AMC
>
>
>
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.752 / Virus Database: 503 - Release Date: 03/09/2004

Krall, Gary

2004-09-14 23:06:55 UTC

Permalink

Jefsey:

Our SDK (which includes both C as well as a Java implementation) is open
sourced and covered under BSD licensing provisions. For the specific
information about our library you may go to:

http://www.verisign.com/products-services/naming-and-directory-services/nami
ng-services/internationalized-domain-names/idn-registrars/page_001408.html

Additionally for a list of current environments which support IDNA (includes
both applications as well as programming languages) you may go to:

http://www.verisign.com/products-services/naming-and-directory-services/nami
ng-services/internationalized-domain-names/page_002201.html

Hope this helps.

Gary.

-----Original Message-----
From: JFC (Jefsey) Morfin [mailto:***@jefsey.com]
Sent: Tuesday, September 14, 2004 3:32 PM
To: Krall, Gary; 'IETF idn working group'
Subject: RE: [idn] IDNA and Control codes (0-1F) and 7F

Dear Gary,
were can this DSK be found. Or is it proprietary?
thanks
jfc morfin

At 22:30 14/09/2004, Krall, Gary wrote:

>All:
>
>Just as an fyi, the C library in the Verisign SDK ("Xcode") has the
>UseSTD3ASCIIRules rule set by default. It is a compile switch option
which
>can be changed in the library's configuration file.
>
>Also note that the library does not accept null characters embedded in
input
>strings regardless of how this flag is set. Control characters other than
>null will pass through as expected without the std3 flag set.
>
>Gary.
>
>-----Original Message-----
>From: Adam M. Costello [mailto:idn.amc+***@nicemice.net.RemoveThisWord]
>Sent: Monday, September 13, 2004 11:58 PM
>To: IETF idn working group
>Subject: Re: [idn] IDNA and Control codes (0-1F) and 7F
>
>
>Michel Suignard <***@windows.microsoft.com> wrote:
>
> > By excluding table C.2.1 from the StringPrep profile used by IDNA, the
> > ToASCII operation allows all C0 control codes and 7F in its default
> > mode (UseSTD3ASCIIRules flag unset).
>
>As Paul said, there is no default mode. It is up to the application
>to decide whether to set UseSTD3ASCIIRules. Of course a library could
>make one mode or the other the default, and it is free to choose either
>mode to be the default. A library could also, I suppose, implement only
>one mode or the other, in which case I guess it would be an incomplete
>conformant implementation of ToASCII.
>
> > This is rather troublesome as these control codes, especially the 00
> > value, may create all sorts of issues for run time libraries that use
> > zero as string terminator on input.
>
>If the programming environment customarily uses a string representation
>that does not allow embedded NULs to be represented, then it will be a
>moot point whether your ToASCII implementation handles NUL correctly,
>because it cannot be tested anyway. You can reasonably claim that it's
>not your IDN library that's incomplete, but the programming environment
>that's incomplete.
>
>Note that ToASCII and ToUnicode will never try to output an embedded NUL
>character if they never receive an embedded NUL character as input.
>
>For an example of a C library that handles embedded NULs, see GNU
>libidn:
>
>http://www.gnu.org/software/libidn/
>
> > I understand the value of allowing all ASCII non control characters
> > but allowing by default the control characters in a ToASCII function
> > seems to open the door for all sorts of abuse and security risks.
>
>ToASCII and ToUnicode never introduce control characters; they output
>only those control characters that were already present in the input.
>If you consider control characters to be dangerous, then I would think
>you'd want to reject them as early as possible, before you even get to
>calling ToASCII or ToUnicode. One you have control-code-free strings,
>ToASCII and ToUnicode will preserve that property.
>
> > Would a library that by default only allow the range 20-7E be still
> > considered conformant?
>
>If you want to claim to have a complete implementation of ToASCII, then
>I think it needs to be possible to pass control characters through.
>But it needn't be the default mode. If you want to add your own
>AllowControlChars flag that is unset by default, I see no problem with
>that. The spec standardizes the function (input --> output), but not
>the interface.
>
>AMC
>
>
>
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.752 / Virus Database: 503 - Release Date: 03/09/2004