Sign In
SQL Server Engine Tips
Guidelines, Best Practices, TSQL and SQL Programming Tips & Tricks.
Translate This Page
Translate this page
Powered by
Microsoft® Translator
Options
Blog Home
About
Email Blog Author
Share this
RSS for posts
Atom
RSS for comments
Search
Advanced search options...
Search In:
Everything
Blogs
Forums
People
Groups
Places
Pages
Date range:
All Time
Last Year
Last 6 Months
Last 3 Months
Last Month
Last Week
Last Two Days
Tags
Announcements
Architecture
Best Practices
Catalog Views
Dynamic Management View
General
Oracle SQL Migration
Pages
Performance - Engine
Performance - SQL/TSQL
Performance - Tools
Performance Troubleshooting
Programming
SQL Server 2005
SQL Server 2008
Archive
Archives
July 2008
(1)
June 2008
(2)
August 2007
(1)
April 2007
(1)
March 2007
(3)
February 2007
(3)
December 2006
(1)
November 2006
(2)
September 2006
(3)
July 2006
(1)
June 2006
(3)
May 2006
(1)
February 2006
(3)
October 2005
(7)
September 2005
(3)
August 2005
(2)
July 2005
(3)
June 2005
(3)
May 2005
(2)
Oracle's PERCENTILE_CONT implementation using SQL Server 2005 analytic functions...
MSDN Blogs
>
SQL Server Engine Tips
>
Oracle's PERCENTILE_CONT implementation using SQL Server 2005 analytic functions...
Oracle's PERCENTILE_CONT implementation using SQL Server 2005 analytic functions...
SQL Server Engine Team
15 May 2006 6:03 PM
Comments
7
I saw an interesting question today in the MSDN Transact-SQL forum about implementing PERCENTILE_CONT analytic function that is available in Oracle in SQL Server. The function description as noted in the Oracle docs is below:
The
PERCENTILE_CONT
function is an inverse distribution function that assumes a continuous distribution model. It takes a percentile value and a sort specification, and returns an interpolated value that would fall into that percentile value with respect to the sort specification. Nulls are ignored in the calculation.
The details of the function can be obtained from
PERCENTILE_CONT
topic in the Oracle docs. Implemention such an analytic function in a query using SQL Server 2000 is tough due to the absence of windowed aggregates. SQL Server 2005 however implements some of the basic analytic functions like ROW_NUMBER, NTILE, OVER clause for standard aggregate functions which makes the port easier to do.
We will use the sample EMP table from Oracle for our examples. I have the equivalent TSQL script that creates the sample schema in SQL Server. Let us take a look at the data in the EMP table for relevant columns first:
EMPNO DEPTNO SAL
----------- ------ ---------------------------------------
7839 10 5000.00
7782 10 2450.00
7934 10 1300.00
7902 20 3000.00
7788 20 3000.00
7566 20 2975.00
7876 20 1100.00
7369 20 800.00
7698 30 2850.00
7499 30 1600.00
7844 30 1500.00
7521 30 1250.00
7654 30 1250.00
7900 30 950.00
The Oracle query using PERCENTILE_CONT to get the median salary is:
SELECT deptno,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sal DESC) as percentile_sal
FROM emp
GROUP BY deptno;
The result of the query in Oracle is:
deptno percentile_sal
------ ---------------------------------------
10 2450.000
20 2975.000
30 1375.000
The same query in Oracle can be written using some of the primitive analytic functions as below:
select deptno, sum(case grp
when 1 then (case PRN when RN then sal end)
else (case PRN
when FRN then (CRN - RN) * sal
when CRN then (RN - FRN) * sal end)
end) as percentile_sal
from (
select deptno, sal, N, PRN, RN, CRN, FRN
, case when e3.CRN = e3.FRN and e3.CRN = e3.RN then 1 else 0 end as grp
from (
select deptno, sal, N, PRN, RN, CEIL(RN) as CRN, FLOOR(RN) as FRN
from (
select deptno, sal, PRN, N, 1 + (0.5* (N-1)) as RN
from (
select deptno, sal
, row_number() over(partition by deptno order by sal desc) as PRN
, count(*) over(partition by deptno) as N
from emp
) e1
) e2
) e3
) er
group by deptno;
Now, it is easy to write the same query in SQL Server 2005 also. I started of actually by writing a simpler but slower version of the query first based on the Oracle implementation details and easier to understand. This particular query is shown below:
declare @p float;
set @p = .5;
with e1 as (
select deptno, count(*) as N
from emp
group by deptno
), e2 as (
select deptno, N, 1 + (@p * (N-1)) as RN
from e1
), e3 as (
select deptno, N, RN, CEILING(RN) as CRN, FLOOR(RN) as FRN from e2
), er as (
select deptno, sal, row_number() over(partition by deptno order by sal desc) as RN
from emp
)
select deptno,
case
when e3.CRN = e3.FRN and e3.CRN = e3.RN
then (select er.sal from er where er.deptno = e3.deptno and er.RN = e3.RN)
else ((e3.CRN - e3.RN) *
(select er.sal from er where er.deptno = e3.deptno and er.RN = e3.FRN)) +
((e3.RN - e3.FRN) *
(select er.sal from er where er.deptno = e3.deptno and er.RN = e3.CRN))
end as percentile_sal
from e3;
go
This query can however be simplified to scan the data only once and optimized further. The simplified and better performing version of the query is shown below:
declare @p float;
set @p = .5;
with e1 as (
select deptno, sal, row_number() over(partition by deptno order by sal desc) as PRN, count(*) over(partition by deptno) as N
from emp
), e2 as (
select deptno, sal, PRN, N, 1 + (@p * (N-1)) as RN
from e1
), e3 as (
select deptno, sal, N, PRN, RN, CEILING(RN) as CRN, FLOOR(RN) as FRN
from e2
), er as (
select deptno, sal, N, PRN, RN, CRN, FRN, case when e3.CRN = e3.FRN and e3.CRN = e3.RN then 1 else 0 end as grp
from e3
)
select deptno, sum(case grp
when 1 then (case PRN when RN then sal end)
else (case PRN
when FRN then (CRN - RN) * sal
when CRN then (RN - FRN) * sal end)
end) as percentile_sal
from er
group by deptno;
go
The ORDER BY clause in the PERCENTILE_CONT function goes in the ROW_NUMBER OVER clause in TSQL query and the GROUP BY clause becomes the PARTITION BY clause in the TSQL query. And the percentile value is specified in a variable.
Attachment:
Oracle PERCENT_CONT Implementation.sql
7 Comments
SQL Server 2005
,
Oracle SQL Migration
Blog - Comment List MSDN TechNet
Comments
Loading...
Leave a Comment
Name
Comment
Please add 2 and 2 and type the answer here:
Post