I am reading about sharding and I understood it upto some context. But most of the material I read says that sharding (horizontally scaling) RDBMS is a challenging task. But I don't see why NO-SQL is easy to shard and RDBMS would be tough to shard?
My understanding is: some NO-SQL provides inbuilt sharding support which makes it easy to shard. But if the NO-SQL does not provide inbuilt sharding support, then sharding overhead in SQL/NO-SQL is same thing as it has to be implemented in application layer.
Is my understanding correct or did I miss anything?
I don't think sharding is particularly "harder" in a SQL versus a NO-SQL database from the user perspective. After all, the complicated stuff is all done "under the hood", so the interface for users is pretty similar.
Sharding means that rows of a given table are stored separately -- often in local storage on different nodes. The issue is keeping them up-to-date.
One key difference is that SQL enforces ACID properties on the data, in particular "consistency". This means that queries see the database only after transactions have been completed entirely or not at all.
NO-SQL databases typically implement eventual consistency. That is, a given transaction may take some time (typically measured in seconds up to a minute) before the transaction completes across all shards.
Consider the situation where a query is deleting one row in each shard. A SQL database will either see all rows deleted or none. A NO-SQL database might return intermediate results.
The advantage of NO-SQL is that large databases are often append-only and transactions only affect one shard -- so eventual consistency is quite good-enough.
The advantage of SQL databases is that consistency is guaranteed (well, in some databases you can fiddle with settings to weaken it). However, there is a higher cost of waiting for all shards to agree that a transaction has completed.
I will note that in some situations SQL databases have a tremendous application advantage -- because the applications do not need to deal with potentially inconsistent data.
but in delete example, no-sql will delete row from each shard and then return.The deletion of each shard may take some time to reflect if different zones of same shard.Am it correct?
@rahulsharma . . . But other queries may see partially deleted data -- that is, data where rows are deleted in some shard but not in others. That is what "eventual consistency" means. It is not about the
delete
operation itself, but what the data looks like.