How to Calculate Percentiles in SQL Server

In SQL Server, percentiles are used to divide a data set into 100 equal parts. These statistical measures help analyze the distribution of data and are particularly useful in various data analysis scenarios, such as identifying thresholds or determining outliers. In this tutorial, we will walk through the different methods available in SQL Server to calculate percentiles, focusing on the PERCENTILE_CONT and PERCENTILE_DISC functions.

Using the PERCENTILE_CONT Function

The PERCENTILE_CONT function calculates the value of a specified percentile by interpolating between values in a result set. This is a continuous percentile, meaning that it provides a real number result, which can be useful for precise calculations.

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER () AS MedianSalary
FROM employees;
                                

In this example, PERCENTILE_CONT(0.5) calculates the median salary from the employees table. The WITHIN GROUP clause defines how to order the data for percentile calculation.

Using the PERCENTILE_DISC Function

On the other hand, PERCENTILE_DISC is a discrete percentile function. Instead of interpolating, it returns the actual value that corresponds to the given percentile in the ordered dataset.

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) OVER () AS MedianSalary
FROM employees;
                                

Here, PERCENTILE_DISC(0.5) returns the salary at the 50th percentile, without interpolating between values.

Example Scenario: Calculating Percentiles for a Data Set

Let’s say we have a table sales that stores sales amounts for a company. We can calculate different percentiles to determine sales thresholds for various percent categories, such as top 10%, median, or bottom 10%.

SELECT 
    PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY sales_amount) OVER () AS Bottom10Percent,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sales_amount) OVER () AS MedianSales,
    PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY sales_amount) OVER () AS Top10Percent
FROM sales;
                                

In this case, we’re calculating the bottom 10%, median, and top 10% sales amounts from the sales table.

Conclusion

Calculating percentiles in SQL Server is a powerful way to analyze data and derive useful insights. Whether you're working with continuous or discrete data, the PERCENTILE_CONT and PERCENTILE_DISC functions provide you with the flexibility to compute percentiles based on your needs. By utilizing these functions, you can better understand your data’s distribution and make informed decisions.