Taming Uncertainty

Sahil's Notepad

Locality Sensitive Hashing (LSH) and Min Hash

[Indyk-Motwani’98] Many distance related questions (nearest neighbor, closest x, ..) can be...

Author: sahilthaker Date: 06/11/2008

Set Similarity and Min Hash

Given two sets S1, S2, find similarity(S1, S2) - based not hamming distance (not Euclidean). Jaccard...

Author: sahilthaker Date: 06/10/2008

Information Retrieval & Search - Basic IR Models

Our focus in the database world has primarily been on retrieving information from a structured...

Author: sahilthaker Date: 03/05/2008

Information Theory (1) - The Science of Communication

IT is a beautiful sub-field of CS with applications across the gamut of scientific fields: coding...

Author: sahilthaker Date: 02/21/2008

Random Sampling over Joins

Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod...

Author: sahilthaker Date: 02/11/2008

Converting Between Random Sampling Methods

  Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn...

Author: sahilthaker Date: 02/05/2008

Reservoir Sampling

A simple random sampling strategy to produce a sample without replacement from a stream of data -...

Author: sahilthaker Date: 02/05/2008