Discussion:
Tilde
tedd
2004-08-09 14:14:35 UTC
Permalink
To whomever:

A simple question:

For reasons beyond me, in IDNS the Tilde (code point 007E) is
prohibited, but the Tilde Operator (code point 223C) is not.

Considering that keyboard space is at a premium, why isn't code point
007E mapped to 223C in PUNYCODE?

Many thanks in advance for any enlightenment.

tedd
--
--------------------------------------------------------------------------------
http://sperling.com/
Paul Hoffman / IMC
2004-08-09 15:01:53 UTC
Permalink
Post by tedd
For reasons beyond me, in IDNS the Tilde (code point 007E) is
prohibited, but the Tilde Operator (code point 223C) is not.
All ASCII characters that were not allowed by STD3 were continued to
be not allowed. This was due to the fact that many of them were being
used as special characters in other protocol elements such as URIs.

--Paul Hoffman, Director
--Internet Mail Consortium
tedd
2004-08-09 20:29:00 UTC
Permalink
Post by Paul Hoffman / IMC
Post by tedd
For reasons beyond me, in IDNS the Tilde (code point 007E) is
prohibited, but the Tilde Operator (code point 223C) is not.
All ASCII characters that were not allowed by STD3 were continued to
be not allowed. This was due to the fact that many of them were
being used as special characters in other protocol elements such as
URIs.
--Paul Hoffman, Director
--Internet Mail Consortium
Paul:

Not that it makes any functional difference, but is the Tilde (code
point 007E) actually used in other protocol elements, or is it just a
member of a range (i.e., all ACSII) that is reserved for possible
use, or do you know?

Thanks for your comment and reply.

tedd
--
--------------------------------------------------------------------------------
http://sperling.com/
Adam M. Costello
2004-08-10 07:09:43 UTC
Permalink
in IDNA the Tilde (code point 007E) is prohibited, but the Tilde
Operator (code point 223C) is not.
IDNA inherits the prohibition of U+007E from RFC-1123 (STD-3), which by
reference to RFC-952 defined host names as ASCII strings containing only
A-Z, a-z, 0-9, hyphen-minus, and dot. Therefore some ASCII characters
were explicitly allowed, all other ASCII characters were explicitly
forbidden, and non-ASCII characters were not even in the realm of
possibility.

In order to extend the notion of host name to non-ASCII strings, we
needed to keep the existing prohibitions on ASCII characters in host
names (otherwise it wouldn't be a proper extension), but the rules
for non-ASCII characters were up to the working group to define. The
consensus was to allow all non-ASCII Unicode graphic characters (perhaps
because the group could never have reached agreement on any particular
non-empty set of prohibited graphic characters).
Considering that keyboard space is at a premium, why isn't code point
007E mapped to 223C in PUNYCODE?
Punycode accepts and supports all Unicode characters, including
non-graphic characters and all ASCII characters, including U+007E. It
does no mapping. All mapping and prohibition are done at higher layers.

I supposed you could instead ask why tilde isn't mapped to tilde
operator in Nameprep. The mapping step in Nameprep was designed to
avoid alternate representations of the same characters, and to erase
case distinctions, not to save typing. Tilde and tilde operator are
entirely distinct characters according to the Unicode spec (and if we
had decided not to accept the Unicode spec at face value, we'd still be
arguing about what maps to what). If tilde operator is too difficult to
type, then don't register domain names containing it.

We made one concession for ease of typing, for dot, only because all
domain names (except TLDs) are *required* to contain dots, and dots can
be cumbersome to type for the huge number of CJK users. The mapping
from ideographic full stop to dot is not done in Nameprep, which sees
only individual labels, not the separators between them, but at a
higher layer that divides the domain name into labels, converts them
independently, and glues them back together.

AMC
tedd
2004-08-10 14:29:10 UTC
Permalink
Post by Adam M. Costello
I supposed you could instead ask why tilde isn't mapped to tilde
operator in Nameprep.
Yes, that was my question.
Post by Adam M. Costello
The mapping step in Nameprep was designed to
avoid alternate representations of the same characters, and to erase
case distinctions, not to save typing.
It's not a question of "saving typing" -- it's a question of keyboard
real estate for the end-user.

If, as you say, the mapping step is to avoid alternate
representations and to erase case distinctions, then it has failed
because it doesn't produced anything. Instead, the process simply
prohibits the character, and any replacement, which is not mentioned
in the aforementioned design.

Now, if the tilde character is currently used in some fashion by
behind the screens Internet techs, as Paul suggested, then I can
understand why the tilde character is prohibited.

However, if the tilde character is not being used and if you want to
take the position that "keyboard real estate" is of no concern to
you, then that's your decision -- but please realize that you do so
at the expense of the end-user and you do so without any real reason.

Please tell me why mapping the tilde to the tilde operator wouldn't work.

Thank you.

tedd
--
--------------------------------------------------------------------------------
http://sperling.com/
Martin v. Löwis
2004-08-10 20:50:47 UTC
Permalink
If, as you say, the mapping step is to avoid alternate representations
and to erase case distinctions, then it has failed because it doesn't
produced anything.
Why do you say that? The mapping clearly avoids alternate
representations and erases case distinctions. For example,
"www.LÖWIS.de" is treated as if it was "www.löwis.de".

So I fail to see that the mapping step has failed. It is very
successful.
Now, if the tilde character is currently used in some fashion by behind
the screens Internet techs, as Paul suggested, then I can understand why
the tilde character is prohibited.
I'd like to point out that it was always the intention, and is the
existing practice, that the IDNA RFCs are augmented by policies of the
registrars, which further constrain the set of characters that you can
use within a particular zone.

To my knowledge, none of the TLD registrars currently allows
registration of names which contain TILDE OPERATOR. So for
one-below-toplevel, the entire issue is irrelevant.
Please tell me why mapping the tilde to the tilde operator wouldn't work.
Because it would not matter. Consider a domain label "foo~", and assume
we are applying the "ToAscii" function, trying to generate the IDNA
version of the label. Please follow me though chapter 4 of RFC 3490 now.
Further assume that UseSTD3ASCIIRules is true.

1. If the sequence contains any code points outside the ASCII range
(0..7F) then proceed to step 2, otherwise skip to step 3.

No, this label does not contain any code points outside the ASCII
range. So we proceed to step 3

3. If the UseSTD3ASCIIRules flag is set, then perform these checks:
(a) Verify the absence of non-LDH ASCII code points; that is, the
absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.

UseSTD3ASCIIRules is true, so we check. The label contains a
non-LDH code point, so ToASCII fails.

Now, your proposal is that TILDE was mapped to TILDE OPERATOR.
That would have happened in step 2. However, according to the
specification, we have *skipped* step 2. Therefore, your mapping
approach wouldn't work, and ToASCII would fail.

Regards,
Martin
tedd
2004-08-11 00:42:41 UTC
Permalink
Post by tedd
If, as you say, the mapping step is to avoid
alternate representations and to erase case
distinctions, then it has failed because it
doesn't produced anything.
Why do you say that?
I say that with respect to the Tilde code point
only. Nameprep, in prohibiting the code point,
has neither avoided an alternate representation
nor erased a case distinction -- it just said
"No".
Post by tedd
The mapping clearly avoids alternate
representations and erases case distinctions. For example,
"www.LÖWIS.de" is treated as if it was "www.löwis.de".
I did not say that it didn't. I only said that it
failed to do anything with respect to the Tilde
except prohibit it -- and that statement is still
true. For sake of argument, what's the alternate
representation or case distinction problem
presented by the Tilde?
Post by tedd
So I fail to see that the mapping step has failed. It is very successful.
Mapping has proved to be useful for most code
points -- I'm not claiming otherwise (other than
with glyphs like the Omega). But, the current
rules for which nameprep operates simply
prohibits use of the Tilde. However, the reason
for this is not founded in avoidance of alternate
representations nor to erase case distinctions --
on the contrary, it appears arbitrary to me until
someone provides me with a reason otherwise.
Post by tedd
Post by tedd
Now, if the tilde character is currently used
in some fashion by behind the screens Internet
techs, as Paul suggested, then I can understand
why the tilde character is prohibited.
I'd like to point out that it was always the intention, and is the
existing practice, that the IDNA RFCs are augmented by policies of the
registrars, which further constrain the set of characters that you can
use within a particular zone.
To my knowledge, none of the TLD registrars currently allows
registration of names which contain TILDE OPERATOR. So for
one-below-toplevel, the entire issue is irrelevant.
You are misinformed -- domains names, which
include the TILDE OPERATOR, can be registered in
both ".com" and ".net" TLD's and most likely
other registrars as well.
Post by tedd
Post by tedd
Please tell me why mapping the tilde to the tilde operator wouldn't work.
Because it would not matter. Consider a domain label "foo~", and assume
we are applying the "ToAscii" function, trying to generate the IDNA
version of the label. Please follow me though chapter 4 of RFC 3490 now.
Further assume that UseSTD3ASCIIRules is true.
1. If the sequence contains any code points outside the ASCII range
(0..7F) then proceed to step 2, otherwise skip to step 3.
No, this label does not contain any code points outside the ASCII
range. So we proceed to step 3
(a) Verify the absence of non-LDH ASCII code points; that is, the
absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.
UseSTD3ASCIIRules is true, so we check. The label contains a
non-LDH code point, so ToASCII fails.
Now, your proposal is that TILDE was mapped to TILDE OPERATOR.
That would have happened in step 2. However, according to the
specification, we have *skipped* step 2. Therefore, your mapping
approach wouldn't work, and ToASCII would fail.
Regards,
Martin
I don't see step #2.

If you're argument is "It won't work, because it
doesn't", then I can't argue with that circular
logic -- other than to say, I don't see any
"valid" reason for its foundation.

Please realize that you are correct in your claim
*only* because the tilde (code point 07E) is
prohibited in step 3. So, by design, the process
prohibits the character, but it does so for no
specific purpose that I am aware -- and that's my
point -- and has remained my unanswered question.
So, specifically why does the "process"
(nameprep, rule #3, "plan 9 from outer space", or
whatever) prohibit code point 07E?

Also FYI, the character string "foo~" (where "~"
is the TILDE OPERATOR) currently translates to
xn--foo-ch2a, which can be registered as
xn--foo-ch2a.com ("foo~.com). This domain is
perfectly legal in both .com and .net TLD's -- in
fact, it's currently open.

In summary, my claim is that if you can map
uppercase "A" to lowercase "a", then you can map
the TILDE to the TILDE OPERATOR.

The point here is to save keyboard real estate
if: a) there is no reason for it to be
prohibited; b) and if a simple mapping function
(or whatever) can save a key without creating
problems elsewhere; then why not?

tedd

PS: Whenever I wade into this list, I feel like a
baby seal entering a pod of killer whales.
--
--------------------------------------------------------------------------------
http://sperling.com/
Adam M. Costello
2004-08-11 01:50:22 UTC
Permalink
Nameprep, in prohibiting the code point, has neither avoided an
alternate representation nor erased a case distinction -- it just said
"No".
Nameprep does not prohibit tilde or any ASCII character. ASCII
characters are prohibited in ToASCII step 3, only if UseSTD3ASCIIRules
is set.

What I said about Nameprep was "The mapping step in Nameprep was
designed to avoid alternate representations of the same characters, and
to erase case distinctions, not to save typing." The other steps of
Nameprep are there for other reasons. For example, the prohibition step
of Nameprep is there to avoid names containing characters that you can't
see. I focused on the mapping step because you were proposing to add a
mapping.

I don't understand the distinction between "save keyboard real estate"
and "save typing". Tilde operator is allowed by IDNA, it's just
difficult to type (probably involves typing the Unicode number or
selecting from a menu). Adding a mapping from tilde to tilde operator
would make it easier to type because it would allow the use of existing
keyboard real estate. But it would not be compatible with the way ASCII
names have always been treated.
it appears arbitrary to me until someone provides me with a reason
otherwise.
The original restriction on host name syntax *was* arbitrary, but it was
made thirty years ago (RFC-608) and has been with us ever since. It was
not IDNA's place to change the basic rules of ASCII host names after all
this time.

It's quite possible that in thirty years some software has been created
that depends on the fact that ASCII host names don't contain tilde.
In summary, my claim is that if you can map uppercase "A" to lowercase
"a", then you can map the TILDE to the TILDE OPERATOR.
I don't see how that follows. The equivalence between A and a, and the
prohibition of tilde, were both established at the same time in the same
document thirty years ago.

AMC
Martin v. Löwis
2004-08-11 04:59:42 UTC
Permalink
You are misinformed -- domains names, which include the TILDE OPERATOR,
can be registered in both ".com" and ".net" TLD's and most likely other
registrars as well.
This is not true. Please take a look at

http://www.verisign.com/products-services/naming-and-directory-services/naming-services/internationalized-domain-names/page_001382.html

This is the list of scripts which are supported in the .com and .net
zones. Characters that don't belong to one of these scripts, or
labels that draw characters from multiple of these scripts, cannot
be registered. As TILDE OPERATOR is in none of the listed scripts,
no label containing it can be registered with Verisign when IDNA
leaves the testbed status in that zone.

For another example, please refer to DeNICs policies for the .de
zone:

http://www.denic.de/de/richtlinien.html

In the section "Anlage", they list all characters supported. So you
can only register labels with a few non-ASCII Latin characters, but
no other scripts - let alone TILDE OPERATOR.
I don't see step #2.
If you're argument is "It won't work, because it doesn't", then I can't
argue with that circular logic -- other than to say, I don't see any
"valid" reason for its foundation.
No, that is not what I am arguing. I understood your proposal that you
suggest a modification to Nameprep, i.e. to change the mapping. I'm
saying that a change to Nameprep won't have any effect.
Please realize that you are correct in your claim *only* because the
tilde (code point 07E) is prohibited in step 3.
Please realise that this is not the case. In step 1, step 2 is skipped
if the character is in ASCII. So Nameprep is not even being invoked.
So, by design, the
process prohibits the character, but it does so for no specific purpose
that I am aware
The prohibition in step 3 is for backwards compatibility. Existing
implementations are known to fail if confronted with a label that
contains the ASCII TILDE.
-- and that's my point -- and has remained my unanswered
question. So, specifically why does the "process" (nameprep, rule #3,
"plan 9 from outer space", or whatever) prohibit code point 07E?
Nameprep (RFC 1391) does not prohibit the character. Step 3 of ToASCII
prohibits it for backwards compatibility, in order to protect the
stability of the Domain Name System.
Also FYI, the character string "foo~" (where "~" is the TILDE OPERATOR)
currently translates to xn--foo-ch2a, which can be registered as
xn--foo-ch2a.com ("foo~.com). This domain is perfectly legal in both
.com and .net TLD's -- in fact, it's currently open.
Did you try that? Even if it works, .com is still in testbed, and,
according to the IANA rules, and the published Verisign policies,
will not be allowed in production use.

If you just used some Web interface to find out whether it has not
been registered yet, and could be, then you probably have used a
Web interface which does not implement the Verisign policy correctly.
In summary, my claim is that if you can map uppercase "A" to lowercase
"a", then you can map the TILDE to the TILDE OPERATOR.
Yes, but that would have no effect to IDNA, as I have explained.
Even though Nameprep also maps "A" to "a", that mapping has no effect
for a pure ASCII label. Nameprep simply does not affect pure ASCII
labels.

So if you have a name such as MICROsoft.com, the protocol allows
transmission of it as-is, and IDNA does not change that. Instead,
RFC 1035 specifies (section 2.3.1) that this is treated the same
way as, say, microsoft.com.
The point here is to save keyboard real estate if: a) there is no reason
for it to be prohibited;
There is: allowing it would break compatibility with RFC 1035.
b) and if a simple mapping function (or
whatever) can save a key without creating problems elsewhere; then why not?
A change to the mapping function would have no effect.

Please do read and understand the relevant internet standards first.

Regards,
Martin
tedd
2004-08-11 15:57:58 UTC
Permalink
Post by Martin v. Löwis
Also FYI, the character string "foo~" (where "~" is the TILDE
OPERATOR) currently translates to xn--foo-ch2a, which can be
registered as xn--foo-ch2a.com ("foo~.com). This domain is
perfectly legal in both .com and .net TLD's -- in fact, it's
currently open.
Did you try that? Even if it works, .com is still in testbed, and,
according to the IANA rules, and the published Verisign policies,
will not be allowed in production use.
If you just used some Web interface to find out whether it has not
been registered yet, and could be, then you probably have used a
Web interface which does not implement the Verisign policy correctly.
Yes, I did try that and it did appear to work. From my experience,
what I said above still remains true.

Let me present you with my finding and you tell me where it's not
correct, Okay?

Please go to:

http://mct.verisign-grs.com/index.shtml

Clearly this is not "some Web interface", because it is a Versign Web
interface, is it not? I assume that a Versign Web interface would
conform to the Versign policy correctly, wouldn't you think?

In any event, please enter the following character string (Punycode):

xn--0bh.com

Select the radio button "Punycode" and click "Convert".

The result will be "~.com" (TILDE OPERATOR DOT COM).

At the bottom of the page, please do a Whois Query for Domain. The
result will show that the domain name is currently registered with
TUCOWS and has been for several years.

Considering all, are you saying that this domain name will never be
allowed to be used (as you say, "in production use") like current
"standard" domains? As such, the individual who registered and paid
for this domain name, and is still paying renewals, will never to be
allowed to use it?
Post by Martin v. Löwis
Please do read and understand the relevant internet standards first.
That's my desire -- but I'm not as involved as you, and your peers,
in the IDNS process and it's hard for me to figure out what's
relevant and what's not. For example, Versign reported that they
notified registrars who have invalid IDN registrations on April 9,
2003. As such, if the above noted domain was invalid, as you claim,
then shouldn't the registered owner have been notified of such by
now? I have personal knowledge (knowing the owner) that this hasn't
happened. Why not?

And, if what you claim is common knowledge in your industry, then why
is Versign, and it's accredited registrars, still receiving money for
registrations and renewals for domain names that they know can never
be used? This doesn't sound right, does it?

Now, please, please enlighten me.

tedd

PS: And, thanks for your comments.
--
--------------------------------------------------------------------------------
http://sperling.com/
Martin v. Löwis
2004-08-11 17:36:20 UTC
Permalink
Post by tedd
The result will be "~.com" (TILDE OPERATOR DOT COM).
At the bottom of the page, please do a Whois Query for Domain. The
result will show that the domain name is currently registered with
TUCOWS and has been for several years.
I see. This is a testbed registration. If it is old enough, Verisign
will support for some time. However, before the testbed goes into
production, Verisign reserves the right to abandon registrations which
don't match the policy. See

http://www.verisign.com/products-services/naming-and-directory-services/naming-services/internationalized-domain-names/idn-standards/idn-character-variants/page_001485.html

for the phases they plan to implement. In phase III, applicants
will have to specify a language tag. If the label then does not
match the characters allowed, the registration will be denied.
Post by tedd
Considering all, are you saying that this domain name will never be
allowed to be used (as you say, "in production use") like current
"standard" domains?
Correct - atleast not at one-under-toplevel. Local administrators
may, of course, establish other policies for assigning host names
and subdomains.
Post by tedd
As such, the individual who registered and paid for
this domain name, and is still paying renewals, will never to be allowed
to use it?
I believe at some point, renewal will not be possible. See the
description of phase III.
Post by tedd
That's my desire -- but I'm not as involved as you, and your peers, in
the IDNS process and it's hard for me to figure out what's relevant and
what's not. For example, Versign reported that they notified registrars
who have invalid IDN registrations on April 9, 2003. As such, if the
above noted domain was invalid, as you claim, then shouldn't the
registered owner have been notified of such by now?
I believe (without factually knowing) that there are two levels of
"incorrectness". One is failure to follow nameprep procedures, by
using characters that are forbidden in nameprep, or by not using the
proper normal form. I believe such registration have been eliminated
by now. A registration for TILDE OPERATOR is allowed, according to
nameprep, so it wasn't eliminated.

I believe that such registration still does not follow the policies
for .com or .net (or any other gTLD where IANA has approved IDN
operations). How Verisign plans to deal with the testbed
registrations, and in what time frame, I don't know.
Post by tedd
And, if what you claim is common knowledge in your industry, then why is
Versign, and it's accredited registrars, still receiving money for
registrations and renewals for domain names that they know can never be
used? This doesn't sound right, does it?
The registration is still available, and the name is still being
resolved. However, in .com and .net, the entire thing also is still
a testbed.

Why Verisign charges for testbed participation, I don't know. Probably
because users are willing to pay.

Regards,
Martin
tedd
2004-08-11 17:04:29 UTC
Permalink
Post by Martin v. Löwis
-- and that's my point -- and has remained my unanswered question.
So, specifically why does the "process" (nameprep, rule #3, "plan 9
from outer space", or whatever) prohibit code point 07E?
Nameprep (RFC 1391) does not prohibit the character. Step 3 of ToASCII
prohibits it for backwards compatibility, in order to protect the
stability of the Domain Name System.
Okay, so rule #3 does prohibits the TILDE because of backward
compatibility issues. I can understand that, if there are backward
compatibility issues -- are there?
Post by Martin v. Löwis
In summary, my claim is that if you can map uppercase "A" to
lowercase "a", then you can map the TILDE to the TILDE OPERATOR.
Yes, but that would have no effect to IDNA, as I have explained.
Even though Nameprep also maps "A" to "a", that mapping has no effect
for a pure ASCII label. Nameprep simply does not affect pure ASCII
labels.
Okay, so Nameprep has nothing to do with it -- I picked the wrong
procedure, my fault.

While using the TILDE might break previous
protocols/procedures/whatever -- does that also mean: 1) the use of a
replacement character (such as the TILDE OPERATOR) would break
anything; 2) if the end-user entered a tilde character, is there no
way for the IDNA protocol to map it to the TILDE OPERATOR; 3) is this
just something that has not been, or will not be, considered; 4) or,
is my ignorance of these issues just so bad that I'm not making any
sense?

Respectfully,

tedd
--
--------------------------------------------------------------------------------
http://sperling.com/
Martin v. Löwis
2004-08-11 17:42:50 UTC
Permalink
Post by tedd
Okay, so rule #3 does prohibits the TILDE because of backward
compatibility issues. I can understand that, if there are backward
compatibility issues -- are there?
I believe so, yes. This issue was studied extensively during the
design of IDNA, and a large number of contributors have indicated
that absolute, 100% backwards compatibility is an absolute,
sine-qua-non requirement. IDNA would not have passed IETF if
there would have been the slightest indication that it isn't
fully, 100% backwards compatible.
Post by tedd
3) is this just something that has not been, or will not
be, considered;
It hasn't been considered. Now that the RFCs have been published
and are already widely implemented, it is extremely unlikely that
it will be reconsidered.

Tell your friend he can unregister his domain - users will never
be able to type the domain name.

Regards,
Martin
tedd
2004-08-11 20:18:36 UTC
Permalink
I've reviewed the above link and find no specific indication that
the TILDE OPERATOR is, or is not, within the listed scripts. Where
specifically do you see that?
Verisign hasn't published the precise procedures for checking
language tags yet. However, it appears obvious that the "scripts"
they support coincide with the Unicode code blocks, as shown on
http://www.unicode.org/charts/
TILDE OPERATOR is from the "Mathematical Operators" block, which
is not listed in the Verisign list of scripts.
Regards,
Martin
Martin:

Okay, I'm beginning to understand.

What you refer to as "scripts" are blocks as described by the
divisions shown at the above link (i.e., Basic Latin, Latin-1, and so
on...) AND which are selected and approved by Versign. The only
difference, is that within any "approved" specific block, there may
be additional code points that will not be permitted -- is that what
scripts are?

Also, I find it interesting to note that most of the code points
found in the "Mathematical Operators" block are universal with
respect to language (Math is universal). As such, considering that
Versign has not listed these as a script, then am I to understand
that these set of universal language independent code points will not
to be allowed in the IDNS. That seems contrary to the purpose of
defining an universal standard, doesn't it?

Do you know if Versign is considering adding other "scripts", or has
the consideration process for additional scripts passed?

tedd
--
--------------------------------------------------------------------------------
http://sperling.com/
Martin v. Löwis
2004-08-11 20:46:35 UTC
Permalink
What you refer to as "scripts" are blocks as described by the divisions
shown at the above link (i.e., Basic Latin, Latin-1, and so on...) AND
which are selected and approved by Versign. The only difference, is that
within any "approved" specific block, there may be additional code
points that will not be permitted -- is that what scripts are?
Yes, this is my understanding. Of course, for some scripts, there are
additional issues, like character equivalences for CJK. See the Verisign
character equivalence discussions for details.
Also, I find it interesting to note that most of the code points found
in the "Mathematical Operators" block are universal with respect to
language (Math is universal). As such, considering that Versign has not
listed these as a script, then am I to understand that these set of
universal language independent code points will not to be allowed in the
IDNS.
Certainly not. Verisign has no say of what is and what is not allowed
in an IDN. They control .com and .net, and set policies for these two
zones only.

If you meant to ask "these set of universal language independent code
points will not to be allowed in the IDNs in the .net and .com zones",
then yes, that was my understanding. However, Pat Kane, who works for
Verisign, has just claimed the contrary.
That seems contrary to the purpose of defining an universal
standard, doesn't it?
Not at all. The universal standard defines a protocol, and that works
just fine. You type a Unicode domain name in your browser, and the
browser tries to resolve it. If the domain has been registered, the
browser will resolve it. The browser does not need to know the policy
of the domain registrar for that - a name that does not follow the
policy will get converted to Punycode, and the DNS server will tell
that it is not registered. So there is perfect interoperability.

There are good reasons for registrars to implement such policies.
Otherwise, somebody could register "miсrosoft.com", where the
letter "c" is actually "CYRILLIC SMALL LETTER ES" - and that
just happens to look similar to a latin "c" in most fonts.

Therefore, registrars need policies to prevent that from happening.
One such policy is "if one letter is cyrillic, they all have to be".
I don't actually know whether Verisign has a policy for valid
labels in the cyrillic script, but if there should be a policy,
the registrar is the place where to enforce it.
Do you know if Versign is considering adding other "scripts", or has the
consideration process for additional scripts passed?
I guess the process of developing character tables for all the languages
is still underway. Beyond that, I have no idea - but Pat Kane can
probably comment in more detail.

Regards,
Martin
tedd
2004-08-12 16:31:55 UTC
Permalink
Post by Martin v. Löwis
There are good reasons for registrars to implement such policies.
Otherwise, somebody could register "miÒrosoft.com", where the
letter "c" is actually "CYRILLIC SMALL LETTER ES" - and that
just happens to look similar to a latin "c" in most fonts.
Therefore, registrars need policies to prevent that from happening.
One such policy is "if one letter is cyrillic, they all have to be".
I don't actually know whether Verisign has a policy for valid
labels in the cyrillic script, but if there should be a policy,
the registrar is the place where to enforce it.
Of that, I'm not so sure.

The process you are focusing on is prohibiting
abuse rather than providing more opportunity for
"law bidding" users. In other words, you're
hurting all, because of a few.

Clearly, if Microsoft has problems with someone
mimicking their name, then there are the courts
and ICANN and other avenues for recourse. The
Internet is not that much different than any
other publishing industry. If you want to
regulate it from the get-go, then I suspect that
you will face more than your share of problems.

For example, recently the Casinos in Las Vegas
was approached by Homeland Security with tapes of
suspected terrorist. However, the Casinos turned
down the offer. Why? Because by viewing the
tapes, they opened themselves to more liability
if anything happened. As Jay Leno said last night
on the Tonight show "They are more afraid of
Lawyers than of Terrorist."

Likewise, if the actions of the IETF IDN (or
whomever) is to limit certain code points in an
effort to prohibit the aforementioned abuse, then
they are also assuming liability if someone
out-thinks them.

Interesting, don't you think?

tedd
--
--------------------------------------------------------------------------------
http://sperling.com/
YangWoo Ko
2004-08-14 00:56:02 UTC
Permalink
I don't believe that a few technical mesaures can save us from all
possible legal pitfalls. However, if there is anything for registries to
exercise to reduce possibilites, they will/shall do it as far as they do
not break standards and others contracts.

One of nice features of domain name in the context of really
commercialized Internet is that it can live with off-line world. We can
use it at advertisements on bus, TV and we can add our homepage's URIs
in name cards. In this regard, looking similar characters should be
allowed in a very cautious way. That's why ICANN suggested inclusion
based approach.
Post by Martin v. Löwis
There are good reasons for registrars to implement such policies.
Otherwise, somebody could register "mi�rosoft.com", where the
letter "c" is actually "CYRILLIC SMALL LETTER ES" - and that
just happens to look similar to a latin "c" in most fonts.
Therefore, registrars need policies to prevent that from happening.
One such policy is "if one letter is cyrillic, they all have to be".
I don't actually know whether Verisign has a policy for valid
labels in the cyrillic script, but if there should be a policy,
the registrar is the place where to enforce it.
Of that, I'm not so sure.
The process you are focusing on is prohibiting abuse rather than providing more
opportunity for "law bidding" users. In other words, you're hurting all,
because of a few.
Clearly, if Microsoft has problems with someone mimicking their name, then
there are the courts and ICANN and other avenues for recourse. The Internet is
not that much different than any other publishing industry. If you want to
regulate it from the get-go, then I suspect that you will face more than your
share of problems.
For example, recently the Casinos in Las Vegas was approached by Homeland
Security with tapes of suspected terrorist. However, the Casinos turned down
the offer. Why? Because by viewing the tapes, they opened themselves to more
liability if anything happened. As Jay Leno said last night on the Tonight show
"They are more afraid of Lawyers than of Terrorist."
Likewise, if the actions of the IETF IDN (or whomever) is to limit certain code
points in an effort to prohibit the aforementioned abuse, then they are also
assuming liability if someone out-thinks them.
Interesting, don't you think?
tedd
--
-------------------------------------------------------------------------------
-
http://sperling.com/
--
/*------------------------------------------------
The ones doing their job, doing what they were
meant to do, are invisible. -- Matrix Reloaded
Ko, YangWoo / a human / ***@mrko.pe.kr
------------------------------------------------*/
Adam M. Costello
2004-08-11 01:22:13 UTC
Permalink
Post by tedd
Please tell me why mapping the tilde to the tilde operator wouldn't work.
It wouldn't be backward compatible. A primary design goal of IDNA was
that it should not alter the way ASCII domain names are treated. When
an ASCII domain name contains a tilde, existing software might reject
the name because it expects a host name and RFC-1123 prohibits tilde in
host names, or it might pass the tilde straight through, either because
it is not taking responsibility for enforcing RFC-1123 or because it is
expecting a non-host-name domain name that permits tilde (DNS allows
all ASCII characters). But in any case, existing software does not map
tilde to something else.

IDNA supports both behaviors. When UseSTD3ASCIIRules is set, it
prohibits non-LDH ASCII characters, and when UseSTD3ASCIIRules is unset,
it permits all ASCII characters.

AMC

P.S. For examples of non-host-name domain names, see RFC-2782
(SRV records) and RFC-2317 (PTR records for classless in-addr.arpa
delegation).
Kane, Pat
2004-08-11 19:01:21 UTC
Permalink
-----Original Message-----
Sent: Wednesday, August 11, 2004 1:00 AM
To: tedd
Cc: IETF idn working group
Subject: Re: [idn] Tilde
You are misinformed -- domains names, which include the TILDE OPERATOR,
can be registered in both ".com" and ".net" TLD's and most likely other
registrars as well.
This is not true. Please take a look at
http://www.verisign.com/products-services/naming-and-directory-
services/naming-services/internationalized-domain-names/page_001382.html
This is the list of scripts which are supported in the .com and .net
zones. Characters that don't belong to one of these scripts, or
labels that draw characters from multiple of these scripts, cannot
be registered. As TILDE OPERATOR is in none of the listed scripts,
no label containing it can be registered with Verisign when IDNA
leaves the testbed status in that zone.
Scripts that are not listed here can still be registered. Scripts
identified in Unicode such as Mathematical Operators were left off so that
there would be focus on the "... more than 350 languages ..." referenced at
the top of that same page.
For another example, please refer to DeNICs policies for the .de
...
Also FYI, the character string "foo~" (where "~" is the TILDE OPERATOR)
currently translates to xn--foo-ch2a, which can be registered as
xn--foo-ch2a.com ("foo~.com). This domain is perfectly legal in both
.com and .net TLD's -- in fact, it's currently open.
Did you try that? Even if it works, .com is still in testbed, and,
according to the IANA rules, and the published Verisign policies,
will not be allowed in production use.
Com and net IDNs are in the production zones. VeriSign indicated that there
would be no IDNs in the production zones until standards were published and
could be deployed. VeriSign began the inclusion of standards compliant IDNs
in the com and net zones in December of 2003.
If you just used some Web interface to find out whether it has not
been registered yet, and could be, then you probably have used a
Web interface which does not implement the Verisign policy correctly.
I am unclear as to what VeriSign policy you are referring to here.
In summary, my claim is that if you can map uppercase "A" to lowercase
"a", then you can map the TILDE to the TILDE OPERATOR.
Yes, but that would have no effect to IDNA, as I have explained.
Even though Nameprep also maps "A" to "a", that mapping has no effect
for a pure ASCII label. Nameprep simply does not affect pure ASCII
labels.
...
Regards,
Martin
Regards,

Pat Kane
VeriSign Naming and Directory Services
703.948.3349
Martin v. Löwis
2004-08-11 20:29:26 UTC
Permalink
Post by Kane, Pat
Com and net IDNs are in the production zones.
That means the Web pages are just out of date? They still refer to the
testbed. Specifically, on

http://www.verisign.com/products-services/naming-and-directory-services/naming-services/internationalized-domain-names/idn-standards/verisign-idn-testbed/index.html

it says "VeriSign’s IDN Testbed is in Phase 3.2."
Post by Kane, Pat
Post by Martin v. Löwis
If you just used some Web interface to find out whether it has not
been registered yet, and could be, then you probably have used a
Web interface which does not implement the Verisign policy correctly.
I am unclear as to what VeriSign policy you are referring to here.
There seem to be multiple. For example, on

http://www.verisign.com/products-services/naming-and-directory-services/naming-services/internationalized-domain-names/idn-standards/idn-character-variants/page_001485.html

Verisign refers to phases wrt. registration. For phase III (for which
I don't know whether it is in place yet), Verisign says

"all IDN registrations will require a valid language tag."

It then refers to language tables. Although it is not fully clear
what the purpose of the language tables is (potentially beyond
generating character variants), it appears that the intention
also is to constrain labels to only use a subset of the allowed
characters. For example, the table for Polish refers to
draft-bartosiewicz-idn-pltld-07.txt, which constrains the set
of allowed characters in .PL. So I would assume that a
registration which uses POL as the language tag can only use
the characters listed in that internet draft.

What language tag should I use when I want to register a domain
that contains TILDE OPERATOR?

Regards,
Martin
tedd
2004-08-11 20:29:27 UTC
Permalink
Pat:

Great! Someone from Versign.
Post by Kane, Pat
Scripts that are not listed here can still be registered. Scripts
identified in Unicode such as Mathematical Operators were left off so that
there would be focus on the "... more than 350 languages ..." referenced at
the top of that same page.
Will Versign's focus, at some point in the near future (i.e., less
than 5 years, return to consider "other than language specific"
blocks/scripts like Mathematical Operators, Dingbats, and such?

Thanks.

tedd
--
--------------------------------------------------------------------------------
http://sperling.com/
Kane, Pat
2004-08-11 19:13:07 UTC
Permalink
-----Original Message-----
Sent: Wednesday, August 11, 2004 1:36 PM
To: tedd
Cc: IETF idn working group
Subject: Re: [idn] Tilde
Post by tedd
The result will be "~.com" (TILDE OPERATOR DOT COM).
At the bottom of the page, please do a Whois Query for Domain. The
result will show that the domain name is currently registered with
TUCOWS and has been for several years.
I see. This is a testbed registration. If it is old enough, Verisign
will support for some time. However, before the testbed goes into
production, Verisign reserves the right to abandon registrations which
don't match the policy. See
VeriSign is currently publishing RFC compliant IDNs into the com and net
zones. The only registrations that were not migrated from the RACE and Name
Prep 03 versions were those that were not compliant with those RFCs. "~.com"
(TILDE OPERATOR DOT COM)is compliant.
http://www.verisign.com/products-services/naming-and-directory-
services/naming-services/internationalized-domain-names/idn-standards/idn-
character-variants/page_001485.html
for the phases they plan to implement. In phase III, applicants
will have to specify a language tag. If the label then does not
match the characters allowed, the registration will be denied.
VeriSign began requiring language tags in December of 2003 which began Phase
III. Phase III also calls for deployment of IDNs into the com and net zones
and no longer publish them to a third-level zone.
Post by tedd
Considering all, are you saying that this domain name will never be
allowed to be used (as you say, "in production use") like current
"standard" domains?
Correct - atleast not at one-under-toplevel. Local administrators
may, of course, establish other policies for assigning host names
and subdomains.
Post by tedd
As such, the individual who registered and paid for
this domain name, and is still paying renewals, will never to be allowed
to use it?
I believe at some point, renewal will not be possible. See the
description of phase III.
Renewal for any of the domains in the com and net base is possible.
Post by tedd
That's my desire -- but I'm not as involved as you, and your peers, in
the IDNS process and it's hard for me to figure out what's relevant and
what's not. For example, Versign reported that they notified registrars
who have invalid IDN registrations on April 9, 2003. As such, if the
above noted domain was invalid, as you claim, then shouldn't the
registered owner have been notified of such by now?
I believe (without factually knowing) that there are two levels of
"incorrectness". One is failure to follow nameprep procedures, by
using characters that are forbidden in nameprep, or by not using the
proper normal form. I believe such registration have been eliminated
by now. A registration for TILDE OPERATOR is allowed, according to
nameprep, so it wasn't eliminated.
I believe that such registration still does not follow the policies
for .com or .net (or any other gTLD where IANA has approved IDN
operations). How Verisign plans to deal with the testbed
registrations, and in what time frame, I don't know.
VeriSign follows the published RFCs and supports subsequent IDN guidelines.
All non-compliant registrations have been removed from the base of IDN
registrations.

If you have specific questions about the VeriSign deployment, feel free to
contact me directly.
Post by tedd
And, if what you claim is common knowledge in your industry, then why is
Versign, and it's accredited registrars, still receiving money for
registrations and renewals for domain names that they know can never be
used? This doesn't sound right, does it?
The registration is still available, and the name is still being
resolved. However, in .com and .net, the entire thing also is still
a testbed.
Why Verisign charges for testbed participation, I don't know. Probably
because users are willing to pay.
Regards,
Martin
Regards,

Pat Kane
VeriSign Naming and Directory Services
703.948.3349
Michel Suignard
2004-08-11 20:44:31 UTC
Permalink
Concerning 'script', nobody should use the Unicode (or ISO 10646) blocks as definition. It is really failing for many writing systems, especially writing systems that span several blocks. It seems that the Verisign site is silent about the formal definition of script. It should refer to http://www.unicode.org/reports/tr24/tr24-5.html which is the best way to define script.

Michel
Loading...