Basics
- How to Insert
- How to Update
- How to Delete
- How to Trim Strings
- How to Use substring()
- How to Use substring() with RegEx to Extract a String
- How to Replace Substrings
- How to Modify Arrays
- How to Compare Arrays
- How to Concatenate Strings
- How to Convert the Case of a String
- How to Create an Array
- How to Insert Data Into an Array
- How to Query Arrays
- How to Use string_agg()
Database Management
- How to Create a Table
- How to Drop a Table
- How to Rename a Table
- How to Truncate a Table
- How to Duplicate a Table
- How to Add a Column
- How to Change a Column Name
- How to Add a Default Value
- How to Remove a Default Value From a Column
- How to Add a Not Null Constraint
- How to Remove a Not Null Constraint
- How to Add an Index
- How to Drop an Index
- How to Create a View
- How to Drop a View
- How to Reset Sequence
- How to Drop a Column
Dates and Times
Analysis
- How to Use nullif()
- How to Use Lateral Joins
- How to Calculate Percentiles
- How to Get the First Row per Group
- How to Use generate_series to Avoid Gaps in Data
- How to Do Type Casting
- How to Write a Common Table Expression
- How to Import a CSV using Copy
- How to Compare Two Values When One Is Null
- How to Use Coalesce
- How to Write a Case Statement
- How to Use Filter to Have Multiple Counts
- How to Calculate Cumulative Sum-Running Total
- How to Query a JSON Column
How to Calculate Percentiles in PostgreSQL
Percentiles are crucial in data analysis to understand the distribution of values in a dataset. PostgreSQL provides several built-in ways to calculate percentiles efficiently.
What is a Percentile?
A percentile indicates the value below which a given percentage of observations fall. For example, the 90th percentile means 90% of the data is below this value.
Using percentile_cont and percentile_disc
PostgreSQL offers two primary functions:
percentile_cont: Returns a continuous percentile, interpolating between values if necessary.percentile_disc: Returns a discrete percentile, selecting the closest actual value.
Example: Continuous Percentile
SELECT percentile_cont(0.9) WITHIN GROUP (ORDER BY salary) AS p90 FROM employees;This returns the 90th percentile salary, interpolating between salaries if needed.
Example: Discrete Percentile
SELECT percentile_disc(0.9) WITHIN GROUP (ORDER BY salary) AS p90 FROM employees;This returns the first salary value that reaches the 90th percentile.
Calculating Multiple Percentiles at Once
SELECT percentile_cont(ARRAY[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY salary) AS quartiles FROM employees;This returns the 25th, 50th (median), and 75th percentiles in a single query.
Using Percentiles in Groups
SELECT department,
percentile_cont(0.9) WITHIN GROUP (ORDER BY salary) AS p90
FROM employees
GROUP BY department;This calculates the 90th percentile salary per department.
Things to Keep in Mind
- Percentile values range from 0 to 1 (not 0 to 100).
- For performance, ensure you have indexes on the
ORDER BYcolumns when working with large tables. - If you need a simpler estimate, you can approximate percentiles using
NTILEor window functions.
Conclusion
PostgreSQL’s percentile functions are powerful tools for analyzing data distributions. With percentile_cont and percentile_disc, you can perform advanced statistical queries with ease. Incorporate these into your analytics workflows to gain deeper insights from your data.
Automate PostgreSQL reporting with DataReportive
Turn your PostgreSQL queries into scheduled reports delivered to your team or customers.