Sign In
Taming Uncertainty
Sahil's Notepad
Translate This Page
Translate this page
Powered by
Microsoft® Translator
Options
Email Blog Author
RSS for posts
Atom
RSS for comments
OK
Search
Advanced search options...
Search In:
Everything
Blogs
Forums
People
Groups
Places
Pages
Date range:
All Time
Last Year
Last 6 Months
Last 3 Months
Last Month
Last Week
Last Two Days
Tags
CS Fundamentals
Databases
Information Retrieval
Information Theory
Probability
Random Algs.
Archive
Archives
June 2008
(2)
March 2008
(1)
February 2008
(4)
MSDN Blogs
>
Taming Uncertainty
Posts
Subscribe via RSS
Sort by:
Most Recent
|
Most Views
|
Most Comments
Excerpt View
|
Full Post View
Taming Uncertainty
Locality Sensitive Hashing (LSH) and Min Hash
Posted
over 4 years ago
by
sahilthaker
1
Comments
[Indyk-Motwani’98] Many distance related questions (nearest neighbor, closest x, ..) can be answered more efficiently by using locality sensitive hashing, where the main idea is that similar objects hash to the same bucket. LSH function:...
Taming Uncertainty
Set Similarity and Min Hash
Posted
over 4 years ago
by
sahilthaker
1
Comments
Given two sets S1, S2, find similarity(S1, S2) - based not hamming distance (not Euclidean). Jaccard Measure View sets at a bit-array. Indexes representing each possible element, and 1/0 representing presence/absence of the element in the set. Then Jaccard...
Taming Uncertainty
Information Retrieval & Search - Basic IR Models
Posted
over 4 years ago
by
sahilthaker
1
Comments
Our focus in the database world has primarily been on retrieving information from a structured source, through a structured query. Relational DBs --> structured data, structured query XML DBs --> semi-structured data, structured query search -->...
Taming Uncertainty
Information Theory (1) - The Science of Communication
Posted
over 4 years ago
by
sahilthaker
1
Comments
IT is a beautiful sub-field of CS with applications across the gamut of scientific fields: coding theory and communications (under unreliable channels), cryptography, physics, biomedical engineering, computer graphics, machine learning, statistics, and...
Taming Uncertainty
Random Sampling over Joins
Posted
over 4 years ago
by
sahilthaker
1
Comments
Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod 1999. What? Random sampling as a primitive relational operator: SAMPLE(R, f) where R is the relation and f the sample fraction. SAMPLE(Q, f)...
Taming Uncertainty
Converting Between Random Sampling Methods
Posted
over 4 years ago
by
sahilthaker
0
Comments
Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn records. Any record could be samples multiple times. Sampling without replacement Each successive sample is uniformly at random from the remaining records...
Taming Uncertainty
Reservoir Sampling
Posted
over 4 years ago
by
sahilthaker
2
Comments
A simple random sampling strategy to produce a sample without replacement from a stream of data - that is, in one pass: O(N) Want to sample s instances - uniformly at random without replacement - from a population size of n records, where n is not...
Page 1 of 1 (7 items)