Saturday, February 25, 2012

returning random records

I need to select a list of random records from a table.
The table has a numeric ID column that is a primary key (identity
property). I would like to return a set of 1000 records where the only
criteria is a random ID number.
I can come up with a cursor solution that does what I need but is there a
set-based solution to the problem?
What is the most efficient way to return say 1000 random records from a
table?Try:
select top 1000
*
from
MyTable
order by
newid()
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
"Dave" <Dave@.discussions.microsoft.com> wrote in message
news:25157515-DC71-4A34-B4C5-EE30FBD1E234@.microsoft.com...
>I need to select a list of random records from a table.
> The table has a numeric ID column that is a primary key (identity
> property). I would like to return a set of 1000 records where the only
> criteria is a random ID number.
> I can come up with a cursor solution that does what I need but is there a
> set-based solution to the problem?
> What is the most efficient way to return say 1000 random records from a
> table?
>
>|||Dave
You need to perfrom
SELECT TOP 1000 *
FROM table
ORDER BY NEWID()
as per the example below:
CREATE TABLE foo
(
i INT
)
SET NOCOUNT OFF
DECLARE @.maxVal BIGINT
DECLARE @.i BIGINT
SET @.maxVal = 10000
SET @.i = 1
BEGIN TRAN
INSERT INTO foo VALUES(1)
WHILE @.i * 2 <= @.maxVal
BEGIN
INSERT INTO foo
SELECT i + @.i FROM foo
SET @.i = @.i * 2
END
INSERT INTO foo
SELECT i + @.i FROM foo
WHERE i + @.i <= @.maxVal
COMMIT TRAN
SELECT TOP 1000 *
FROM foo
ORDER BY NEWID()
- Peter Ward
WARDY IT Solutions
"Dave" wrote:

> I need to select a list of random records from a table.
> The table has a numeric ID column that is a primary key (identity
> property). I would like to return a set of 1000 records where the only
> criteria is a random ID number.
> I can come up with a cursor solution that does what I need but is there a
> set-based solution to the problem?
> What is the most efficient way to return say 1000 random records from a
> table?
>
>|||Hey that will not give randon records but same kind of records everytime as
top uses the same logic( it may be same order)
so you have some thing called RAND function. see books online for that. You
have to do little bit of work to achive the results using rand function. see
books online for that.
Regards
R.D
--Knowledge gets doubled when shared
"P. Ward" wrote:
> Dave
> You need to perfrom
> SELECT TOP 1000 *
> FROM table
> ORDER BY NEWID()
> as per the example below:
>
> CREATE TABLE foo
> (
> i INT
> )
>
> SET NOCOUNT OFF
> DECLARE @.maxVal BIGINT
> DECLARE @.i BIGINT
> SET @.maxVal = 10000
> SET @.i = 1
> BEGIN TRAN
> INSERT INTO foo VALUES(1)
> WHILE @.i * 2 <= @.maxVal
> BEGIN
> INSERT INTO foo
> SELECT i + @.i FROM foo
> SET @.i = @.i * 2
> END
> INSERT INTO foo
> SELECT i + @.i FROM foo
> WHERE i + @.i <= @.maxVal
> COMMIT TRAN
>
> SELECT TOP 1000 *
> FROM foo
> ORDER BY NEWID()
>
> - Peter Ward
> WARDY IT Solutions
>
> "Dave" wrote:
>|||Dave
This should solve your problem: Read this article
http://msdn.microsoft.com/library/d...r />
p04c1.asp
--
Regards
R.D
--Knowledge gets doubled when shared
"Dave" wrote:

> I need to select a list of random records from a table.
> The table has a numeric ID column that is a primary key (identity
> property). I would like to return a set of 1000 records where the only
> criteria is a random ID number.
> I can come up with a cursor solution that does what I need but is there a
> set-based solution to the problem?
> What is the most efficient way to return say 1000 random records from a
> table?
>
>|||On Mon, 7 Nov 2005 21:19:15 -0800, R.D wrote:

>Hey that will not give randon records but same kind of records everytime as
>top uses the same logic( it may be same order)
Hi R.D.,
You're incorrect. This WILL give random rows.
The TOP is executed in conjunction with the ORDER BY. So before applying
the TOP, SQL Server will first order all rows. NEWID() is called for
each row; this results in a semi-random (*) value. The rows are then
ordered by this semi-random value, resulting in a semi-random order.
Then, the TOP 1000 of those semi-randoomly ordered rows are returned.
Try executing the following in Nirthwind:
SELECT TOP 5 * FROM Customers
ORDER BY NEWID()
SELECT TOP 5 * FROM Customers
ORDER BY NEWID()
SELECT TOP 5 * FROM Customers
ORDER BY NEWID()
(*) The generator for NEWID's is not designed to be a good random number
generator, but it's close enoguh for most practical purposes. I wouldn't
use it for serious gambling-strategy analysis or for the 15 million
dollar draw, but for getting random samples out of a table, it's
certainly good enough.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||The first problem is that there are two kinds of random selection from
a set:
1) With replacement = you can get multiple copies of the same value.
This is shooting dice.
This one is easy if you have a random function in your SQL product.
Most of the pseudo-random generators return a floating point fraction
value between 0.00 and 0.9999... at whatever precision your SQL engine
has. The choice of a seed to start the generator can be the system
clock or some other constantly changing value.
SELECT S1.key_col
FROM SomeTable AS S1, SomeTable AS S2
WHERE S1.key_col <= S2.key_col
GROUP BY S1.key_col
HAVING COUNT(S2.key_col)
= (SELECT COUNT(*)
FROM SomeTable AS S3) * RANDOM(seed) + 1.0;
Or you can add a column for this in SQL Server (but not Oracle).
CREATE TABLE RandNbrs2
(seq_nbr INTEGER PRIMARY KEY,
randomizer FLOAT -- warning !! not standard SQL
DEFAULT (
(CASE (CAST(RAND() + 0.5 AS INTEGER) * -1)
WHEN 0.0 THEN 1.0 ELSE -1.0 END)
* (CAST(RAND() * 100000 AS INTEGER) % 10000)
* RAND())
NOT NULL);
INSERT INTO RandNbrs2 VALUES (1, DEFAULT);
INSERT INTO RandNbrs2 VALUES (2, DEFAULT);
INSERT INTO RandNbrs2 VALUES (3, DEFAULT);
INSERT INTO RandNbrs2 VALUES (4, DEFAULT);
INSERT INTO RandNbrs2 VALUES (5, DEFAULT);
INSERT INTO RandNbrs2 VALUES (6, DEFAULT);
INSERT INTO RandNbrs2 VALUES (7, DEFAULT);
INSERT INTO RandNbrs2 VALUES (8, DEFAULT);
INSERT INTO RandNbrs2 VALUES (9, DEFAULT);
INSERT INTO RandNbrs2 VALUES (10, DEFAULT);
2) Without replacement = you can each value only once. This is dealing
playing cards.
This is trickier. I would start with a table that has the keys and a
sequentially numbered column in it:
CREATE TABLE CardDeck
(keycol <datatype> NOT NULL PRIMARY KEY,
seq INTEGER NOT NULL);
INSERT INTO CardDeck (keycol, seq)
SELECT S1.keycol, COUNT(S2.keycol)
FROM SomeTable AS S1, Sometable AS S2
WHERE S1.key_col <= S2.key_col
GROUP BY S1.key_col;
Now shuffle the deck by determing a random swap pair for all the rows.
Somethign like this in SQL/PSM
BEGIN
DECLARE i INTEGER, j INTEGER;
SET i = (SELECT COUNT(*) FROM CardDeck);
WHILE i < 0
LOOP
SET j = (SELECT COUNT(*) FROM CardDeck) * RANDOM(seed) + 1.0;
UPDATE CardDeck
SET seq = CASE WHEN seq = i THEN j
WHEN seq = j THEN i
ELSE seq END;
WHERE seq IN (i, j);
SET i = i - 1;
LOOP END;
END;
You don't really need j, but it makes the code easier to read.
Biography:
Marsaglia, G and Zaman, A. 1990. Toward a Univesal Random Number
Generator.
Statistics & Probability Letters 8 (1990) 35-39.
Marsaglia, G, B. Narasimhan, and A. Zaman. 1990. A Random Number
Generator for
PC's. Computer Physics Communications 60 (1990) 345-349.
Leva, Joseph L. 1992. A Fast Normal Random Number Generator. ACM
Transactions
on Mathematical Software. Dec 01 1992 v 18 n 4. p 449
Leva, Joseph L. 1992. Algorithm 712: A Normal Random Number Generator.
ACM
Transactions on Mathematical Software. Dec 01 1992 v 18 n 4. p 454
Bays, Carter and W.E. Sharp. 1992. Improved Random Numbers for Your
Personal
Computer or Workstation. Geobyte. Apr 01 1992 v7 n2. p 25
Hulquist, Paul F. 1991. A Good Random Number Generator for
Microcomputers.
Simulation. Oct 01 1991 v57 n 4. p 258
Komo, John J. 1991. Decimal Pseudo-random Number Generator. Simulation.
Oct 01
1991 v57 n4. p 228
Chambers, W.G. and Z.D. Dai. 1991. Simple but Effective Modification to
a
Multiplicative Congruential Random-number Generator. IEEE Proceedings.
Computers and Digital Technology. May 01 1991 v 138 n3. p 121
Maier, W.L. 1991.. A Fast Pseudo Random Number Generator. Dr. Dobb's
Journal.
May 01 1991 v17 n 5. p 152
Sezgin, Fatin. 1990. On a Fast and Portable Uniform Quasi-random Number
Generator. Simulation Digest. Wint 1990 v 21 n 2. p 30
Macomber, James H. and Charles S. White. 1990. An n-Dimensional Uniform
Random
Number Generator Suitible for IBM-Compatible Microcomputers.
Interfaces. May 01
1990 v 20 n 3. p 49
Carta, David G. 1990. Two Fast Implementations of the "Minimal
Standard"
Random Number Generator. Communications of the ACM. Jan 01 1990 v 33 n
1. p
87
Elkins, T.A. 1989. A Highly Random-number Generator. Computer
Language. Dec
01 1989 v 6 n 12 p 59
Kao, Chiang. A Random Number Generator for Microcomputers. OR: The
Journal of
the Operational Research Society. Jul 01 1989 v 40 n 7. p 687
Chassing, P. 1989. An Optimal Random Number Generator Zp. Statistics &
Probability Letters. Feb 01 1989 v 7 n 4. p 307
Also, you can contact Kenneth G. Hamilton 72727,177 who has done some
work
with RNG's. He has implemented one (at least one) of the best.
"A Digital Dissolve for Bit-Mapped Graphics Screens" by Mike Morton in
Dr.
Dobb's Journal, November 1986, page 48.
CMOS Cookbook by Don Lancaster; Sams 1977, page 318.
Art of Computer Programming, Volume 2: Seminumeral Algorithms, 2nd
edition by
Donald Knuth; Addison-Wesley 1981; page 29.
Numerical Recipes in Pascal: The Art of Scientific Computing by Press
et al.;
Cambridge 1989; page 233.

No comments:

Post a Comment