- Polars - Home
- Polars - Installation
- Polars - Expressions And Contexts
- Polars - Lazy vs Eager API
- Polars - Data Stucture and Data Types
- Polars Useful Resources
- Polars - Useful Resources
- Polars - Discussion
Python Polars - Expressions and Contexts
What are Expressions in Polars?
Expressions are the core component of polars to perform data transformations. When you work with polars, you cannot manipulate columns and rows directly. For manipulation, you need to write expressions like formulas or different operations. So, in other words we can say, an expression in polars is a way which tells how to perform transformations on columns. Expressions include mathematical operations, aggregations, comparisons, string manipulations etc.
import polars as pl
discount_price = pl.col("price") * 0.85
print(discount_price)
Here, this expression takes the values from the price column and multiplies each of them by 0.85. But, it will not calculate the value because above expression describes only how to calculate the value. So the expression describes only a formula for computing a discounted price.
If we print the above expression it will display as it is the formula we given above−
So, above output says that polars got the formula , but it has not executed anything. That's why we aslo called expressions are lazy, because no computations have done.
What are Contexts in Polars?
As, we read about expressions they do not perform computation on their own. To evaluate them and compute values, Polars introduces another mechanism which is known as contexts. Contexts determine how the expressions are evaluated and executed. In other words, we can say that, expressions can only be computated when they are written inside contexts.
Polars has four most common contexts and they are as −
- select Context
- with_columns Context
- filter Context
- group_by or aggregations Context
To get started working with expressions and contexts, first, we will create a dataframe then we will use above contexts one by one on that dataframe −
Example
Lets create a dataset of employee details so that we can try expressions in different contexts.
import polars as pl
weather = pl.DataFrame({
"city": ["Delhi", "Mumbai", "Chennai", "Delhi", "Mumbai"],
"temperature": [32, 29, 34, 35, 30],
"humidity": [60, 75, 68, 55, 80]
})
print(weather)
Let us compile and run the above program, this will produce the following result −
select Context
It is used to select or transform particular column from a dataframe.
To select, a single column using select context we can write as −
weather.select("temperature") # selecting temperature column
Example
In the following program we are selectig a single column using select context.
import polars as pl
weather = pl.DataFrame({
"city": ["Delhi", "Mumbai", "Chennai", "Delhi", "Mumbai"],
"temperature": [32, 29, 34, 35, 30],
"humidity": [60, 75, 68, 55, 80]
})
temp= weather.select("temperature") # selecting temperature column and store it in a variable
print(temp)
Let us compile and run the above program, it will print the temperature column as it is −
Now, Selecting a column using an expression select context we can write as −
weather.select(pl.col("temperature")) # selecting temperature column
It will also give the same output as above, but using pl.col() allow to do more transformation and manipulations on selected column.
Example
In this program, we will do transformation inside select context. Lets suppose we want to convert all temperature values from celsius to Fahrenheit. Formula for this F = (°C × 9/5) + 32
import polars as pl
weather = pl.DataFrame({
"city": ["Delhi", "Mumbai", "Chennai", "Delhi", "Mumbai"],
"temperature": [32, 29, 34, 35, 30],
"humidity": [60, 75, 68, 55, 80]
})
temp= weather.select(
(pl.col("temperature") * 9/5) + 32
) # converting celsius to Fahrenheit
print(temp)
Let us compile and run the above program, it will print the temperature column as it is −
with_columns Context
The with_columns() context in Polars is used when you want to create new columns or modify existing columns using one or more expressions.
Unlike select() context, which returns only the columns you select. But with_columns() keeps all existing columns and adds the new one.
Example
In the following program we are converting the temperature from Celsius to Fahrenheit using an expression and store the result as a new column in dataframe using the with_columns() context. .
import polars as pl
weather = pl.DataFrame({
"city": ["Delhi", "Mumbai", "Chennai", "Delhi", "Mumbai"],
"temperature": [32, 29, 34, 35, 30],
"humidity": [60, 75, 68, 55, 80]
})
temp_far = weather.with_columns(
(pl.col("temperature") * 9/5 + 32).alias("temp_fahrenheit")
)# using with_columns context
print(temp_far)
Let us compile and run the above program, this will produce the following result −
filter Context
The filter() context returns rows that match with a given condition of expression.
Example
Following program filters the DataFrame to keep only those rows where the humidity value is greater than 70.
import polars as pl
weather = pl.DataFrame({
"city": ["Delhi", "Mumbai", "Chennai", "Delhi", "Mumbai"],
"temperature": [32, 29, 34, 35, 30],
"humidity": [60, 75, 68, 55, 80]
})
high_humid = weather.filter(pl.col("humidity") > 70) # using filter context
print(high_humid)
Let us compile and run the above program, this will produce the following result −
group_by or aggregations Context
The group_by context in Polars is used to divide your DataFrame into groups based on one or more columns and then apply expressions on each group. So that, we can perform calculations (like sum, average, count, min, max) within each group separately.
Example
Following program groups the weather data by each city and then applies an expression to calculate the average temperature for every group.
import polars as pl
weather = pl.DataFrame({
"city": ["Delhi", "Mumbai", "Chennai", "Delhi", "Mumbai"],
"temperature": [32, 29, 34, 35, 30],
"humidity": [60, 75, 68, 55, 80]
})
avg_temp = weather.group_by("city").agg(
pl.col("temperature").mean()
) # using group_by and aggregation context
print(avg_temp)
Let us compile and run the above program, this will produce the following result −
Characteristics of Expressions
Following are some characteristics of expressions and they are as −
- Expressions are lazy and execute only inside a context like select() or with_columns().
- Expressions applies on entire columns at once, not on individual values.
- Expressions can be combined to build more complex transformations.
- Expressions can be stored in variables and reused multiple times.
- Expressions support chaining and allows to apply multiple transformations in single line code.
- Expressions make data transformation code readable.