How to sort by value in PySpark?

PySpark is a distributed data processing engine that provides Python APIs for Apache Spark. It enables large-scale data processing and offers several built-in functions for sorting data including orderBy(), sort(), sortBy(), and asc_nulls_last().

Installation

First, install PySpark using pip ?

pip install pyspark

Key Sorting Functions

Function Usage Best For
orderBy() DataFrame column sorting Single/multiple columns with custom order
sort() DataFrame sorting with functions Descending order and null handling
sortBy() RDD sorting with lambda Custom sorting logic on RDDs

Sorting DataFrame by Single Column

Use orderBy() to sort a DataFrame by a specific column ?

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName("SortExample").getOrCreate()

# Create DataFrame
student_data = [("Akash", 25), ("Bhuvan", 23), ("Peter", 18), ("Mohan", 26)]
df = spark.createDataFrame(student_data, ["Name", "Age"])

# Sort by Age in ascending order
sorted_df = df.orderBy("Age")
sorted_df.show()

spark.stop()
+------+---+
|  Name|Age|
+------+---+
| Peter| 18|
|Bhuvan| 23|
| Akash| 25|
| Mohan| 26|
+------+---+

Sorting DataFrame by Multiple Columns

Sort by multiple columns using a list of column names ?

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName("MultiSort").getOrCreate()

# Create DataFrame
product_data = [("Umbrella", 125), ("Bottle", 20), ("Colgate", 118)]
df = spark.createDataFrame(product_data, ["Product", "Price"])

# Sort by Price first, then by Product name
sorted_df = df.orderBy(["Price", "Product"], ascending=[True, True])
sorted_df.show()

spark.stop()
+--------+-----+
| Product|Price|
+--------+-----+
|  Bottle|   20|
| Colgate|  118|
|Umbrella|  125|
+--------+-----+

Sorting in Descending Order

Use desc() function for descending order sorting ?

from pyspark.sql import SparkSession
from pyspark.sql.functions import desc

# Create SparkSession
spark = SparkSession.builder.appName("DescSort").getOrCreate()

# Create DataFrame
employee_data = [("Abhinav", 25, "Male"), ("Meera", 32, "Female"), 
                ("Riya", 18, "Female"), ("Deepak", 33, "Male"), ("Elon", 50, "Male")]
df = spark.createDataFrame(employee_data, ["Name", "Age", "Gender"])

# Sort by Age in descending order
sorted_df = df.sort(desc("Age"))
sorted_df.show()

spark.stop()
+-------+---+------+
|   Name|Age|Gender|
+-------+---+------+
|   Elon| 50|  Male|
| Deepak| 33|  Male|
|  Meera| 32|Female|
|Abhinav| 25|  Male|
|   Riya| 18|Female|
+-------+---+------+

Sorting RDD by Value

Use sortBy() with lambda functions to sort RDD data ?

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName("RDDSort").getOrCreate()

# Create RDD from list of tuples
data = [("X", 25), ("Y", 32), ("Z", 18)]
rdd = spark.sparkContext.parallelize(data)

# Sort RDD by second element (value)
sorted_rdd = rdd.sortBy(lambda x: x[1])

# Collect and display results
for record in sorted_rdd.collect():
    print(record)

spark.stop()
('Z', 18)
('X', 25)
('Y', 32)

Handling Null Values While Sorting

Use asc_nulls_last() to place null values at the end ?

from pyspark.sql import SparkSession
from pyspark.sql.functions import asc_nulls_last

# Create SparkSession
spark = SparkSession.builder.appName("NullSort").getOrCreate()

# Create DataFrame with null values
product_data = [("Charger", None), ("Mouse", 320), ("PEN", 18), 
               ("Bag", 1000), ("Notebook", None)]
df = spark.createDataFrame(product_data, ["Product", "Price"])

# Sort by Price with nulls last
sorted_df = df.sort(asc_nulls_last("Price"))
sorted_df.show()

spark.stop()
+--------+-----+
| Product|Price|
+--------+-----+
|     PEN|   18|
|   Mouse|  320|
|     Bag| 1000|
| Charger| null|
|Notebook| null|
+--------+-----+

Conclusion

PySpark provides multiple ways to sort data: orderBy() for DataFrames, sort() with functions like desc(), and sortBy() for RDDs. Use asc_nulls_last() to handle null values appropriately during sorting operations.

Updated on: 2026-03-27T08:05:06+05:30

928 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements