Welcome to MSDN Blogs Sign in | Join | Help

February 2008 - Posts

Information Theory (1) - The Science of Communication

IT is a beautiful sub-field of CS with applications across the gamut of scientific fields: coding theory and communications (under unreliable channels), cryptography, physics, biomedical engineering, computer graphics, machine learning, statistics, and

Random Sampling over Joins

Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod 1999. What? Random sampling as a primitive relational operator: SAMPLE(R, f) where R is the relation and f the sample fraction. SAMPLE(Q, f) is a tougher

Converting Between Random Sampling Methods

  Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn records. Any record could be samples multiple times. Sampling without replacement Each successive sample is uniformly at random from the remaining records

Reservoir Sampling

A simple random sampling strategy to produce a sample without replacement from a stream of data - that is, in one pass: O(N) Want to sample s instances - uniformly at random without replacement - from a population size of n records, where n is not known.
 
Page view tracker