Sign in
Taming Uncertainty
Sahil's Notepad
Translate This Page
Translate this page
Powered by
Microsoft® Translator
Tags
CS Fundamentals
Databases
Information Retrieval
Information Theory
Probability
Random Algs.
Browse by Tags
MSDN Blogs
>
Taming Uncertainty
>
All Tags
>
cs fundamentals
Tagged Content List
Blog Post:
Locality Sensitive Hashing (LSH) and Min Hash
sahilthaker
[Indyk-Motwani’98] Many distance related questions (nearest neighbor, closest x, ..) can be answered more efficiently by using locality sensitive hashing, where the main idea is that similar objects hash to the same bucket. LSH function: Probability of collision higher for similar objects...
on
11 Jun 2008
Blog Post:
Set Similarity and Min Hash
sahilthaker
Given two sets S1, S2, find similarity(S1, S2) - based not hamming distance (not Euclidean). Jaccard Measure View sets at a bit-array. Indexes representing each possible element, and 1/0 representing presence/absence of the element in the set. Then Jaccard measure = What happens when: n element in each...
on
10 Jun 2008
Blog Post:
Information Theory (1) - The Science of Communication
sahilthaker
IT is a beautiful sub-field of CS with applications across the gamut of scientific fields: coding theory and communications (under unreliable channels), cryptography, physics, biomedical engineering, computer graphics, machine learning, statistics, and even gambling and stock trading. It is truly a marvelous...
on
21 Feb 2008
Blog Post:
Random Sampling over Joins
sahilthaker
Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod 1999. What? Random sampling as a primitive relational operator: SAMPLE(R, f) where R is the relation and f the sample fraction. SAMPLE(Q, f) is a tougher problem, where Q is a relation produced...
on
11 Feb 2008
Blog Post:
Converting Between Random Sampling Methods
sahilthaker
Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn records. Any record could be samples multiple times. Sampling without replacement Each successive sample is uniformly at random from the remaining records Independent Coin flips: choose a record with probability...
on
5 Feb 2008
Blog Post:
Reservoir Sampling
sahilthaker
A simple random sampling strategy to produce a sample without replacement from a stream of data - that is, in one pass: O(N) Want to sample s instances - uniformly at random without replacement - from a population size of n records, where n is not known. Figuring out n would require 2 passes. Reservoir...
on
5 Feb 2008
Page 1 of 1 (6 items)