I've heard that exposing database IDs (in URLs, for example) is a security risk, but I'm having trouble understanding why.
Any opinions or links on why it's a risk, or why it isn't?
EDIT: of course the access is scoped, e.g. if you can't see resource foo?id=123
you'll get an error page. Otherwise the URL itself should be secret.
EDIT: if the URL is secret, it will probably contain a generated token that has a limited lifetime, e.g. valid for 1 hour and can only be used once.
EDIT (months later): my current preferred practice for this is to use UUIDS for IDs and expose them. If I'm using sequential numbers (usually for performance on some DBs) as IDs I like generating a UUID token for each entry as an alternate key, and expose that.
There are risks associated with exposing database identifiers. On the other hand, it would be extremely burdensome to design a web application without exposing them at all. Thus, it's important to understand the risks and take care to address them.
The first danger is what OWASP called "insecure direct object references." If someone discovers the id of an entity, and your application lacks sufficient authorization controls to prevent it, they can do things that you didn't intend.
Here are some good rules to follow:
There are schemes to hide the real identifier from an end user (e.g., map between the real identifier and a temporary, user-specific identifier on the server), but I would argue that this is a form of security by obscurity. I want to focus on keeping real cryptographic secrets, not trying to conceal application data. In a web context, it also runs counter to widely used REST design, where identifiers commonly show up in URLs to address a resource, which is subject to access control.
Another challenge is prediction or discovery of the identifiers. The easiest way for an attacker to discover an unauthorized object is to guess it from a numbering sequence. The following guidelines can help mitigate that:
Expose only unpredictable identifiers. For the sake of performance, you might use sequence numbers in foreign key relationships inside the database, but any entity you want to reference from the web application should also have an unpredictable surrogate identifier. This is the only one that should ever be exposed to the client. Using random UUIDs for these is a practical solution for assigning these surrogate keys, even though they aren't cryptographically secure.
One place where cryptographically unpredictable identifiers is a necessity, however, is in session IDs or other authentication tokens, where the ID itself authenticates a request. These should be generated by a cryptographic RNG.
IMO, adding unpredictable IDs is a "security through obscurity" approach and can lead to a false sense of security. It's better to focus on (1) and (2) and make sure your access control is solid.
Using a cryptographic RNG definitely not "security through obscurity." The attacker is no closer to guessing object identifiers even when she knows how you generate them. Security through obscurity means that if the algorithm you are using is discovered, it can be exploited. It does not refer to keeping secrets, like keys or the internal state of an RNG.
@stucampbell Perhaps, but that doesn't mean that you shouldn't use unpredictable IDs at all. Bugs happen, so unpredictable IDs are an extra safety mechanism. Besides, access control is not the only reason to use them: predictable IDs can reveal sensitive information such as the number of new customers within a certain timeframe. You really don't want to expose such information.
@Stijn you can’t really say that I “really don’t want” to expose how many customers I have. I mean McDonald’s has a huge sign says that they’ve served 10 billion hamburgers. It’s not a security risk at all, it’s a preference. Furthermore, you have to login before you see any URLs in most applications where we would worry about this anyway. Therefore we would know who was scraping data.
One thing that I didn't see mentioned in this conversation is that from a troubleshooting and ease-of-use standpoint, it can be very handy to have ID's exposed in a url to help direct users to a specific resource or have them be able to tell you exactly what resource they're viewing. You can mostly avoid business intelligence concerns by starting the auto increment at a higher value, just to offer one idea.