I personally prefer person_id to be the primary key name (instead of id) in both the person table and any table which has it as a foreign key. One reason is for join syntax:
select * from person join team_member using (person_id)
The other reason is person_id now unambiguously refers to the same field regardless if we're looking at the PK or a FK. It's always person_id.
I'm with you but in my experience it's a lost fight. Most of the projects I come across these days follow the id convention.
The advantages of using person_id are even more obvious in multiple joins, such as a star schema where you can using(person_id) all the things, reducing both the typing and the cognitive load.
I suspect that if this convention was more pervasive, programmers would be a little bit less afraid of diving into SQL.
Well, at least we're settling with some convention, so it's not all bad.
> Yes, this is the same version as I found, but the closest thing I could find to addressing table names in the paper itself was an "Object Class name", something like an OOP Class or something you'd find in a UML diagram, but not really the same as a table name, and in any case all the examples were singular.
> Was actually kinda hoping Celko would deign to comment on this himself as he seems to be the chief proponent of the "collective identifiers as specified by ISO 11179" meme.
Also important is adapting to the existing naming conventions of the database, even if you don't like it. (Unless the existing naming conventions cause more trouble that its worth like requiring quoted identifiers or redundant prefixes/suffixes).
I agree, consistency is worth a lot more than using a slightly better convention. It would also create surprises in the sense of "principle of least surprise" which is in general a good guideline while designing APIs schemas and so on.
Probably the worst thing I've encountered is a junior engineer trying to encourage me to change a coding style by having two coding styles coexist.
I just kept repeating, over and over, that I expected the coding-style to be consistent. It was totally over his head, and he totally didn't even bother looking to find a code formatting utility to do a One-Shot style change.
Without a rigorous attempt at justifying each of these rules, I don't find this article particularly useful. For example, can someone link to or provide a formal explanation for why table names should be singular? I actually really wanted to read the full relational algebra rational for that one.
My own view is they should either be all plural or all singular. Just pick one.
But I do gravitate to singular because all nouns have naturally consistent singular words. The same cannot be said of plural. e.g. Moose, Cactus, and any other word that ends in an 's'.
That's a fair point, but keep in mind that the author isn't suggesting these are the "correct" ways to write SQL. They're just his ways.
I don't think one standard is necessarily better than another, but the important thing is to have rules. Over the years I've adopted similar rules for myself, and just internal consistency is so much better. I have old projects with table names including: "logs", "log_requests", "log_users", "game_logs", etc., not to mention mixing of plural and singular, to the point where I need to `show tables` before writing any query just to remember what I even called the table I need.
Relation definitions define types. Types have names. Tuple in a relation is an instantiation of the corresponding type. And we use singulars for class/types names in most programming systems (e.g. class Person vs class Persons).
I know a reasonable amount of SQL but am by no means a database guru. Naturally, this makes me the local database expert at my current place of employment :/
The other devs' eyes glaze over when I say things like 'stored procedures' and 'trigger functions.' Bah.
> Mixed case identifier names means that every usage of the identifier will need to be quoted in double quotes
I've used quite a few RDBMS engines, including most mentioned by the author, and I've never had to quote mixed-case identifier names. They work just the same as all lower-cased names or as any other case-sensitive language.
Most of the programming languages I use typically have the convention of using PascalCase for classes and public fields/properties so I prefer to use that convention for tables and columns (and then everything else for consistency). When doing operations between the application and the database, the name is exactly the same without the need for translation.
If you are not quoting your mixed-case identifiers, then they are not mixed-case. They are being implicitly converted to either lowercase or uppercase depending on your database, and your capitalization is lost. It normally doesn't matter, until you are introspecting your database schema, at which point your code generation generates a Organizationrole protobuf message instead of an OrganizationRole, or your ORM fails to find OrganizationRole because PostgreSQL stored it as organizationrole.
In Oracle this is kind of hidden away since it is case sensitive, but unquoted identifiers are silently converted to uppercase, quoted identifiers are used as-is. So, for queries against something like
create table foo (
Bar integer,
"Foobar" integer
);
Referring to Bar, BAR, bar, and "Foobar" will work, but foobar and Foobar will not.
Really? I have. In some circles (e.g., the IETF), i18n is an ancient acronym. People who've worked on operating systems (e.g., OS X, Solaris, RHEL, whatever) have to deal with L10N (localization). G11N (globalization) is I18N + L10N.
And then there's a11y: accessibility. This is all about making user interfaces accessible for people with low or no vision, low or no hearing, difficulty typing, and so on.
There are generally applicable laws requiring G11N and A11Y, and these fall heavily on OS vendors, which is why people who've worked on OSes tend to know these acronyms.
I18N -> dealing with Unicode in general, codeset conversions, font issues, ...
L10N -> dealing with translating system/application messages to the users' preferred languages (and how to even know they preferences) (think locales)
G11N -> I18N and L10N.
Localization is damned difficult. There's all sort of little bothersome things, like how to format numbers (which varies quite a lot) and dates (can't we all just use ISO-8601?!). And translating printf-like format strings is often non-trivial, especially when the coder doesn't stop to think about just how hard they might be to a translator as they write their code.
Lot of these is debatable. For example, I have preferred FirstName or even “[First Name]” instead of first_name in sql because lot of tooling uses these names to generate UX. Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming. I have used both approaches with its own pro and cons.
> For example, I have preferred FirstName or even “[First Name]” instead of first_name in sql because lot of tooling uses these names to generate UX.
those tools are wrong (and I know roughly which ones those are).
> Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming.
it would be: person.id and the foreign key column that refers to it person_thing.person_id. This is much preferable to person.person_id and person_thing.person_person_id.
> Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming.
I assume you mean you just use PersonID as the foreign key. This oftentimes introduces ambiguity into what the relationship actually is. I prefer names that describe the actual relationship (e.g. author, owner, approver, etc) rather than letting other people guess what it is.
if you need “[First Name]” so it shows up in an UI, you could always do first_name as 'First Name'. But I would say that's still general bad practice. In many cases, you shouldn't be exposing your column names through a UI, and most UI allow for alias
You should follow the conventions that make life easier in the rest of your tooling.
The fact is, you're probably going to be issuing more SQL via abstractions like ORMs or querying libraries than raw SQL. If you need to work against the grain of those libraries to map your model, what upside are you getting?
If most of your data is queried via ActiveRecord, for example, you should use plural table names.
It is true that most of my SQL goes through ORMs. However my most complicated SQL is always constructed by hand. Furthermore note the point about how often applications get rewritten against the same database. You should not assume that future code will use the same ORM that you are using now.
FWIW, naming conventions are like opinions. Everyone has them, and they usually differ from person to person. The best naming convention is a consistent naming convention. Also, naming conventions differ greatly by environment. A group of SQL Server engineers are going to have different standards than those of people working on mysql.
No. That's how you write sql within your organization. Also there are syntactic differences between SQL flavors ( postgres, mysql, mssql, oracle, etc ) that make a SQL standard unrealistic.
The only generic rule is "be consistent". Whatever convention/style you choose, it should be consistent.
meritt|7 years ago
rodelrod|7 years ago
The advantages of using person_id are even more obvious in multiple joins, such as a star schema where you can using(person_id) all the things, reducing both the typing and the cognitive load.
I suspect that if this convention was more pervasive, programmers would be a little bit less afraid of diving into SQL.
Well, at least we're settling with some convention, so it's not all bad.
EDIT: typo
jermaustin1|7 years ago
SELECT * FROM Person JOIN TeamMember on PersonId = Person.Id
unknown|7 years ago
[deleted]
elchief|7 years ago
As well, fewer keywords are plural, compared to singular, so there's less chance of accidentally using a keyword if you use plurals
Haven't yet seen an "octopus" table in production...
danso|7 years ago
https://social.msdn.microsoft.com/Forums/vstudio/en-US/d5f2f...
> Yes, this is the same version as I found, but the closest thing I could find to addressing table names in the paper itself was an "Object Class name", something like an OOP Class or something you'd find in a UML diagram, but not really the same as a table name, and in any case all the examples were singular.
> Was actually kinda hoping Celko would deign to comment on this himself as he seems to be the chief proponent of the "collective identifiers as specified by ISO 11179" meme.
gwbas1c|7 years ago
daigoba66|7 years ago
ysleepy|7 years ago
gwbas1c|7 years ago
I just kept repeating, over and over, that I expected the coding-style to be consistent. It was totally over his head, and he totally didn't even bother looking to find a code formatting utility to do a One-Shot style change.
daphneokeefe|7 years ago
0xffff2|7 years ago
cwbrandsma|7 years ago
chias|7 years ago
I don't think one standard is necessarily better than another, but the important thing is to have rules. Over the years I've adopted similar rules for myself, and just internal consistency is so much better. I have old projects with table names including: "logs", "log_requests", "log_users", "game_logs", etc., not to mention mixing of plural and singular, to the point where I need to `show tables` before writing any query just to remember what I even called the table I need.
zzzeek|7 years ago
qilo|7 years ago
davvolun|7 years ago
jcadam|7 years ago
The other devs' eyes glaze over when I say things like 'stored procedures' and 'trigger functions.' Bah.
wvenable|7 years ago
> Mixed case identifier names means that every usage of the identifier will need to be quoted in double quotes
I've used quite a few RDBMS engines, including most mentioned by the author, and I've never had to quote mixed-case identifier names. They work just the same as all lower-cased names or as any other case-sensitive language.
Most of the programming languages I use typically have the convention of using PascalCase for classes and public fields/properties so I prefer to use that convention for tables and columns (and then everything else for consistency). When doing operations between the application and the database, the name is exactly the same without the need for translation.
Otherwise, I think it's a good list.
stubish|7 years ago
wfriesen|7 years ago
walshemj|7 years ago
I though the canonical way of doing this was to write KEYWORDS in caps and use camel case for Variables.
Also never really brought into adding the type as part of a name - your type is already defined in your schema.
cryptonector|7 years ago
And then there's a11y: accessibility. This is all about making user interfaces accessible for people with low or no vision, low or no hearing, difficulty typing, and so on.
There are generally applicable laws requiring G11N and A11Y, and these fall heavily on OS vendors, which is why people who've worked on OSes tend to know these acronyms.
I18N -> dealing with Unicode in general, codeset conversions, font issues, ...
L10N -> dealing with translating system/application messages to the users' preferred languages (and how to even know they preferences) (think locales)
G11N -> I18N and L10N.
Localization is damned difficult. There's all sort of little bothersome things, like how to format numbers (which varies quite a lot) and dates (can't we all just use ISO-8601?!). And translating printf-like format strings is often non-trivial, especially when the coder doesn't stop to think about just how hard they might be to a translator as they write their code.
irrational|7 years ago
sytelus|7 years ago
zzzeek|7 years ago
those tools are wrong (and I know roughly which ones those are).
> Similarly using Person.PersonID instead of Person.ID gives consistency in diagrams and foreign key naming.
it would be: person.id and the foreign key column that refers to it person_thing.person_id. This is much preferable to person.person_id and person_thing.person_person_id.
sbov|7 years ago
I assume you mean you just use PersonID as the foreign key. This oftentimes introduces ambiguity into what the relationship actually is. I prefer names that describe the actual relationship (e.g. author, owner, approver, etc) rather than letting other people guess what it is.
matte_black|7 years ago
Do not do this in Postgres, it will be a pain in the ass since you will have to use quotes around everything.
autokad|7 years ago
arez|7 years ago
barrkel|7 years ago
The fact is, you're probably going to be issuing more SQL via abstractions like ORMs or querying libraries than raw SQL. If you need to work against the grain of those libraries to map your model, what upside are you getting?
If most of your data is queried via ActiveRecord, for example, you should use plural table names.
btilly|7 years ago
walshemj|7 years ago
iblaine|7 years ago
Glad you cleared this up for the rest of us.
FWIW, naming conventions are like opinions. Everyone has them, and they usually differ from person to person. The best naming convention is a consistent naming convention. Also, naming conventions differ greatly by environment. A group of SQL Server engineers are going to have different standards than those of people working on mysql.
youpassbutter|7 years ago
The only generic rule is "be consistent". Whatever convention/style you choose, it should be consistent.