Language design: case conventions

Objective arguments to solve case conventions and move on.

published 2020-Oct-16, updated 2023-Mar-17

TLDR: Identifiers in programming languages should use only snake_case, Title_snake_case, UPPER_SNAKE_CASE, ignore abbreviations, and be limited to ASCII alphanumerics with _.

This post will also touch on the structure of identifiers.

There was an earlier, more specialized post: Don't Abbreviate In Camel-Case. This one is more general.

Lower case

Objective arguments in favor of snake_case over camelCase:

Conversion:

one_123_two <-> one 123 two

one123two   <-> one123two
one123two   <-> one123 two
one123two   <-> one 123 two

one123Two   <-> one123 two
one123Two   <-> one 123 two

Title case

Objective arguments in favor of Title_snake_case over TitleCamelCase:

Conversion:

One_123_two <-> one 123 two

One123two   <-> one123two
One123two   <-> one123 two
One123two   <-> one 123 two

One123Two   <-> one123 two
One123Two   <-> one 123 two

Abbreviations

Objective arguments in favor of avoiding abbreviations, for example Json_encoder over JSON_encoder, or JsonEncoder over JSONEncoder:

Example from work.

At some point I had contact with a code base involving generating Go code from Swagger. The generator had a variety of special cases for id, xml, and some other abbreviations. A field named xml_setting_id would become XMLSettingID. However, if you used an abbreviation unknown to the generator, for example XSD (XML Schema Definition), xsd_setting_id would become XsdSettingID.

The goal was noble: be consistent with the Go standard library, which stupidly uses abbreviations, for example MarshalXML. But unlike the standard library, you couldn't just remember "abbreviations are uppercase", your brain needed the database of the exact abbreviations special-cased in that generator. So don't. Don't use abbreviations in identifiers, and don't special-case them in code generators or parsers.

Characters

Objective arguments in favor of restricting identifiers to ASCII alphanumerics with _:

Example from work.

At some point, we at Purelab were using Clojure and Datomic to build apps. Clojure symbols (Lisp equivalent of identifiers) use kebab-case and may contain operator characters such as -?. Booleans are expected to end with a question: hidden? instead of is_hidden.

Datomic has its own idiosyncrasy: column names are globally scoped and include the entity type. So, instead of this:

create table persons (is_email_verified bool not null default false);

...you use this:

{
  :db/ident     :person/email-verified?
  :db/valueType :db.type/boolean
}

For simplicity, let's suppose we use Postgres, and have a JS client. You have to either break the SQL and JS conventions by quoting the field:

create table persons ("email-verified?" bool not null default false);
person['email-verified?']

...or break the Clojure convention by using the interoperable format:

:is_email_verified

Footnote on Lisp symbols

Lisps allow identifiers like email-verified? because they don't distinguish identifiers and operators, or more generally, alphanumerics and special characters. They just have "symbols". This has various problems.

Conclusion

When making a language, follow the conventions listed at the top. Let's solve this forever and move on.