Taming Uncertainty

Sahil's Notepad

  • Taming Uncertainty

    Locality Sensitive Hashing (LSH) and Min Hash

    • 3 Comments
    [Indyk-Motwani’98] Many distance related questions (nearest neighbor, closest x, ..) can be answered more efficiently by using locality sensitive hashing, where the main idea is that similar objects hash to the same bucket.   LSH function:...
  • Taming Uncertainty

    Set Similarity and Min Hash

    • 4 Comments
    Given two sets S1, S2, find similarity(S1, S2) - based not hamming distance (not Euclidean). Jaccard Measure View sets at a bit-array. Indexes representing each possible element, and 1/0 representing presence/absence of the element in the set. Then Jaccard...
  • Taming Uncertainty

    Information Retrieval & Search - Basic IR Models

    • 1 Comments
    Our focus in the database world has primarily been on retrieving information from a structured source, through a structured query. Relational DBs --> structured data, structured query XML DBs --> semi-structured data, structured query search -->...
  • Taming Uncertainty

    Information Theory (1) - The Science of Communication

    • 1 Comments
    IT is a beautiful sub-field of CS with applications across the gamut of scientific fields: coding theory and communications (under unreliable channels), cryptography, physics, biomedical engineering, computer graphics, machine learning, statistics, and...
  • Taming Uncertainty

    Random Sampling over Joins

    • 1 Comments
    Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod 1999. What? Random sampling as a primitive relational operator: SAMPLE(R, f) where R is the relation and f the sample fraction. SAMPLE(Q, f)...
  • Taming Uncertainty

    Converting Between Random Sampling Methods

    • 0 Comments
      Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn records. Any record could be samples multiple times. Sampling without replacement Each successive sample is uniformly at random from the remaining records...
  • Taming Uncertainty

    Reservoir Sampling

    • 3 Comments
    A simple random sampling strategy to produce a sample without replacement from a stream of data - that is, in one pass: O(N) Want to sample s instances - uniformly at random without replacement - from a population size of n records, where n is not...
Page 1 of 1 (7 items)