If you wanted to generate a pseudorandom alphanumeric string using T-SQL, how would you do it? How would you exclude characters like dollar signs, dashes, and slashes from it?
When generating random data, specially for test, it is very useful to make the data random, but reproducible. The secret is to use explicit seeds for the random function, so that when the test is run again with the same seed, it produces again exactly the same strings. Here is a simplified example of a function that generates object names in a reproducible manner:
alter procedure usp_generateIdentifier
@minLen int = 1
, @maxLen int = 256
, @seed int output
, @string varchar(8000) output
as
begin
set nocount on;
declare @length int;
declare @alpha varchar(8000)
, @digit varchar(8000)
, @specials varchar(8000)
, @first varchar(8000)
declare @step bigint = rand(@seed) * 2147483647;
select @alpha = 'qwertyuiopasdfghjklzxcvbnm'
, @digit = '1234567890'
, @specials = '_@# '
select @first = @alpha + '_@';
set @seed = (rand((@seed+@step)%2147483647)*2147483647);
select @length = @minLen + rand(@seed) * (@maxLen-@minLen)
, @seed = (rand((@seed+@step)%2147483647)*2147483647);
declare @dice int;
select @dice = rand(@seed) * len(@first),
@seed = (rand((@seed+@step)%2147483647)*2147483647);
select @string = substring(@first, @dice, 1);
while 0 < @length
begin
select @dice = rand(@seed) * 100
, @seed = (rand((@seed+@step)%2147483647)*2147483647);
if (@dice < 10) -- 10% special chars
begin
select @dice = rand(@seed) * len(@specials)+1
, @seed = (rand((@seed+@step)%2147483647)*2147483647);
select @string = @string + substring(@specials, @dice, 1);
end
else if (@dice < 10+10) -- 10% digits
begin
select @dice = rand(@seed) * len(@digit)+1
, @seed = (rand((@seed+@step)%2147483647)*2147483647);
select @string = @string + substring(@digit, @dice, 1);
end
else -- rest 80% alpha
begin
declare @preseed int = @seed;
select @dice = rand(@seed) * len(@alpha)+1
, @seed = (rand((@seed+@step)%2147483647)*2147483647);
select @string = @string + substring(@alpha, @dice, 1);
end
select @length = @length - 1;
end
end
go
When running the tests the caller generates a random seed it associates with the test run (saves it in the results table), then passed along the seed, similar to this:
declare @seed int;
declare @string varchar(256);
select @seed = 1234; -- saved start seed
exec usp_generateIdentifier
@seed = @seed output
, @string = @string output;
print @string;
exec usp_generateIdentifier
@seed = @seed output
, @string = @string output;
print @string;
exec usp_generateIdentifier
@seed = @seed output
, @string = @string output;
print @string;
Update 2016-02-17: See the comments bellow, the original procedure had an issue in the way it advanced the random seed. I updated the code, and also fixed the mentioned off-by-one issue.
Note that the re-seeding in my example is mostly to ilustrate the point. In practice its enough to seed the RNG once per session, as long as the call sequence is deterministic afterward.
I know this is old thread, but the code returns the same string for seed 192804 and 529126
@RemusRusanu i would also be interested in a response to davey's comment
The seed advances with the formula
@seed = rand(@seed+1)*2147483647
. For the value 529126 the next value is 1230039262 and then the next value is 192804. the sequence continues identically after this. The output should differ by first character, but it does not because of an off-by-one error:SUBSTRING(..., 0, ...)
returns empty string for index 0, and for 529126 this 'hides' the first character generated. Fix is to compute@dice = rand(@seed) * len(@specials)+1
to make the indexes 1 based.This problem arises from the fact that the random series, once they hit a common value, they progress identically. If needed, it can be avoided by making the
+1
in therand(@seed+1)
be itself random and determined from the initial seed only. This way, even if the series hit the same value, they diverge immediately.