Taming Uncertainty

Sahil's Notepad

Browse by Tags

Tagged Content List
  • Blog Post: Locality Sensitive Hashing (LSH) and Min Hash

    [Indyk-Motwani’98] Many distance related questions (nearest neighbor, closest x, ..) can be answered more efficiently by using locality sensitive hashing, where the main idea is that similar objects hash to the same bucket.   LSH function: Probability of collision higher for similar objects...
  • Blog Post: Set Similarity and Min Hash

    Given two sets S1, S2, find similarity(S1, S2) - based not hamming distance (not Euclidean). Jaccard Measure View sets at a bit-array. Indexes representing each possible element, and 1/0 representing presence/absence of the element in the set. Then Jaccard measure = What happens when: n element in each...
  • Blog Post: Converting Between Random Sampling Methods

      Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn records. Any record could be samples multiple times. Sampling without replacement Each successive sample is uniformly at random from the remaining records Independent Coin flips: choose a record with probability...
  • Blog Post: Reservoir Sampling

    A simple random sampling strategy to produce a sample without replacement from a stream of data - that is, in one pass: O(N) Want to sample s instances - uniformly at random without replacement - from a population size of n records, where n is not known. Figuring out n would require 2 passes. Reservoir...
Page 1 of 1 (4 items)