Username Policy

Posted under General

It's apparent from recent events that we really need to hash out a better username policy.

Some rules I'd propose:

  • We create a list of keywords banned from usernames (Anonymous being an obvious example)
  • Usernames that are only different from existing usernames by capitalization should be banned. (Anonymous / anonymous)
  • Unicode names be restricted in some way so as to avoid the ability to use non-characters and alternate scripts to spoof usernames. I like the fact that we can have Unicode names here, but maybe we need to restrict them to Latin & CJK character sets. I'd include Cyrillic, but I think that's where the problem with the аlbert (user 322226) spoofing came from. Non-characters were what allowed the previous spoofing.

Any other ideas?

Updated by a moderator

I see no reason why we should allow non-asian unicode sets. Honestly I'd prefer people to not use names that I can't pronounce without running them through nihongo.j-talk.com, but I'll live with that. But the only reason we should allow them is because this is a site concerned with Japanese material, so there is no reason to allow, say, cyrillic.

And definitely try to ban certain words and names, and see if we can't come up with most of the plausible spoof variations and go ahead and block them from being created.

Not that I think it is a big deal to ban most unicode outright, here's an alternative:

A queue could be implemented that those attempting to register unicode names are put in. An admin or mod would look over the name and make sure that it doesn't spoof any admin or mod's name (or does something annoying).

Complaints about normal users getting spoofed could be reported by the users themselves, since a spoof of a normal user isn't as important.

The proper way is to blacklist script combinations that can result in spoofing, without banning any single script by itself.

The simplest possible implementation would be to forbid names with characters from more than one script. That'd also prevent many perfectly innocuous names from being registered, but has the advantage of being dead simple -- it's just iterating over a string, collecting the Unicode script property for each character, and checking at the end if you end up with >1. It will also let a tiny number of potential spoofs slip through, such as "Cappy" vs. "Сарру", but it's minor and unlikely to produce any serious spoofs, as none of our admins or mods have a susceptible name I believe.

A more involved method would entail building tables of potentially risky characters, and rejecting only the ones composed of $script + spoofs of it, or only spoofable characters (such as the "Сарру" above, which is fully Cyrillic, but looks like an ordinary Latin string). It's more work, but not as much as it sounds -- the white- and blacklist have been built by browsers already, and it'd need mostly finding one such implementation and porting it over, or finding a suitable Ruby module (best case).

葉月 said:
The proper way is to blacklist script combinations that can result in spoofing, without banning any single script by itself.

Who decided it was proper? I think a name with CJK+Latin combined is more likely than one using only Greek, for instance.

Whitelisting is also much easier to administrate (you just have to know the ranges for ASCII/CJK/other non-lookalike scripts as appropriate) and can be changed if some other site has different requirements.

葉月 said:
The proper way is to blacklist script combinations that can result in spoofing, without banning any single script by itself.

There are still ways to spoof strings without mixing scripts. UTR# 39 - Unicode Security Mechanisms describes all this in detail, but basically in addition to mixed-script spoofing there's also whole-script spoofing and single-script spoofing. Your "Cappy" example is one example of whole-script spoofing. EB is another example of a janitor with a whole-script spoofable name.

Single-script involves swapping characters in the same script - for example, swapping the number '0' and the letter 'O', the letter 'l' and the number '1' (works best in a serif font like Courier), or the letter l (lowercase 'L') and the letter I (uppercase 'i', works best in a sans serif font at small sizes). For Latin, the danger is mitigated by the fact that confusable characters are few in number and most people are already familiar with them. The problem is that Japanese also contains confusable characters, and they are very difficult to recognize for anyone who doesn't know the language. It would be extremely easy for me to mistake 菓月 for 葉月, or ズラッシュ for スラッシュ, for instance. There's no reasonable way around this, since anyone who's unfamiliar with Japanese is liable to confuse almost any random moonrune for another one.

葉月 said:
A more involved method would entail building tables of potentially risky characters, and rejecting only the ones composed of $script + spoofs of it, or only spoofable characters (such as the "Сарру" above, which is fully Cyrillic, but looks like an ordinary Latin string).

The document I linked to describes doing just this, and it provides the necessary tables of spoofable characters. Interestingly, it looks like their table includes the l / 1 spoof, but not the l / I spoof. This just goes to show how error prone a blacklisting approach is.

Shinjidude said:
Any other ideas?

Underscores in usernames shouldn't be displayed as spaces. This provides another opportunity for spoofing - compare _Anonymous_ & Anonymous.

1