pandas assignment loc

Understanding pandas.DataFrame.loc[] through 6 examples

Last updated: February 24, 2024

Introduction, creating a sample dataframe, example 1: basic selection, example 2: select multiple rows, example 3: slicing rows, example 4: selecting rows and columns, example 5: conditional selection, example 6: setting values, advanced use: combining with other methods.

The pandas library in Python is a powerhouse for data manipulation and analysis. Among its many features, DataFrame.loc[] stands out for its ability to select data based on label information. This tutorial will guide you through understanding and utilizing loc[] with six comprehensive examples.

Preparation

Ensure you have pandas installed and imported in your Python environment:

Starting with the basics, you can select a single row:

The output will show information for the first row, indexed at 0.

Selecting multiple rows by specifying a list of indices:

This will output rows 0 and 2.

You can slice rows using a colon:

This slice includes rows 1 through 3.

More selective data access by specifying row and column labels:

Outputs will show the name of the first person and names with cities of the second and fourth persons, respectively.

Using conditions to filter rows:

This command lists all persons older than 30 years.

loc[] can also be used to modify data:

The age for the first person has been updated to 29.

Combining loc[] with other pandas methods can unlock even more power. For instance, using loc[] with groupby() for aggregated data selection:

Note: The above might require adjustments based on real data context, as groupby().loc[] isn’t directly applicable. This shows the concept of combining loc[] with other methods.

The pandas.DataFrame.loc[] method is essential for precise data selection and manipulation. Through these examples, you’ve seen its versatility – from basic to more sophisticated data operations. Experiment with these techniques on your own data sets to discover the true power of pandas.

Next Article: pandas.DataFrame.insert() – Inserting a new column at a specific location

Previous Article: Pandas DataFrame: Access and modify the value of a cell with .at[] and .iat[]

Series: DateFrames in Pandas

How to Use Pandas for Geospatial Data Analysis (3 examples)

February 28, 2024

pandas: Get/Set values with loc, iloc, at, iat

You can use loc , iloc , at , and iat to access data in pandas.DataFrame and get/set values. Use square brackets [] as in loc[] , not parentheses () as in loc() .

pandas.DataFrame.loc — pandas 2.0.3 documentation
pandas.DataFrame.iloc — pandas 2.0.3 documentation
pandas.DataFrame.at — pandas 2.0.3 documentation
pandas.DataFrame.iat — pandas 2.0.3 documentation

The differences are as follows:

at , loc : Row/Column name (label)
iat , iloc : Row/Column number
at , iat : Single value
loc , iloc : Single or multiple values

at , iat : Access and get/set a single value

Access a single value, access multiple values using lists and slices, access rows and columns, mask by boolean array and pandas.series, duplicated row/column names, specify by number and name, implicit type conversion when selecting a row as pandas.series.

You can also select rows and columns of pandas.DataFrame and elements of pandas.Series by indexing [] .

pandas: Select rows/columns by index (numbers and names)

Note that the previously provided get_value() and ix[] have been removed in version 1.0 .

The sample code in this article is based on pandas version 2.0.3 . The following pandas.DataFrame is used as an example.

You can specify the row/column name in at . In addition to getting data, you can also set (assign) a new value.

You can specify the row/column number (0-based indexing) in iat .

loc , iloc : Access and get/set single or multiple values

loc and iloc can access both single and multiple values using lists or slices. You can use row/column names for loc and row/column numbers for iloc .

You can access a single value with loc and iloc as well as with at and iat . However, at and iat are faster than loc and iloc .

In addition to retrieving data, you can also set a new value for the element.

With loc and iloc , you can access multiple values by specifying a group of data with a list [a, b, c, ...] and slice start:stop:step .

Note that in the slice notation start:stop:step , the step is optional and can be omitted. For basic usage of slices, see the following article.

How to slice a list, string, tuple in Python

When using the slice notation start:stop:step with loc (which uses row/column names), the stop value is inclusive. However, with iloc (which uses row/column numbers), the stop value is exclusive, following the typical behavior of standard Python slices.

When specified by a list, rows and columns follow the order of that list.

For example, you can extract odd/even rows by specifying step .

You can set multiple values simultaneously. If you assign a scalar value, all selected elements will be set to that value. For assigning values to a range, use a two-dimensional list (list of lists) or a two-dimensional NumPy array ( ndarray ).

Note that selecting a row or a column by specifying it as a scalar value returns Series , whereas the same row or column, specified as a slice or a list, returns DataFrame .

In particular, be aware of potential implicit type conversions when retrieving rows as a Series . See below for details.

You can select rows and columns with df[] . They can be specified as:

Rows: Slice of row name/number
Columns: Column name or list of column names

For more information, see the following article.

You can specify rows and columns in various ways with loc and iloc .

If you omit specifying columns with loc or iloc , rows are selected. You can specify them by row name/number or list of such names/numbers.

You can select columns with loc and iloc by specifying rows as : . It is possible to specify by slice.

As mentioned above, specifying a single row or column with a scalar value returns a Series , while using a slice or list returns a DataFrame .

Note that selecting a row as pandas.Series may result in implicit type conversion. See below for details.

With loc and iloc , you can use a boolean array or list to filter data. While the following example demonstrates row filtering, the same approach can be applied to columns.

If the number of elements does not match, an error is raised.

You can also use a boolean Series with loc for filtering. Note that the filtering is based on matching labels, not on the order of the data.

You cannot specify Series in iloc .

Even with loc , an error is raised if the labels do not match.

Both row names ( index ) and column names ( columns ) can have duplicates.

Consider the following DataFrame with duplicate row and column names as an example.

For at and loc , specifying duplicate names selects the corresponding multiple elements.

When using iat and iloc to specify by row/column number, duplicated names are not an issue because they operate based on position.

To avoid confusion, it's advisable to use unique values for row and column names unless there's a compelling reason otherwise.

You can check whether row and column names are unique (not duplicated) with index.is_unique and columns.is_unique .

pandas.Index.is_unique — pandas 2.0.3 documentation

See the following article on how to rename row and column names.

pandas: Rename column/index names of DataFrame

If you want to specify by both number and name, use at or loc in combination with the index or columns attributes.

You can retrieve row or column names based on their number using the index and columns attributes.

For index and columns , you can use slices and lists to retrieve multiple names.

Using this and at or loc , you can specify by number and name.

Using indexing operations in succession, such as df[...][...] , df.loc[...].iloc[...] , and other similar patterns, is known as "chained indexing". This approach can trigger a SettingWithCopyWarning .

pandas: How to fix SettingWithCopyWarning: A value is trying to be set on ...

While this approach causes no issues during simple data retrieval and checking, be cautious as assigning new values might yield unexpected results.

If the columns of the original DataFrame have different data types, then when selecting a row as a Series with loc or iloc , the data type of the elements in the selected Series might differ from the data types in the original DataFrame .

pandas: How to use astype() to cast dtype of DataFrame

Consider a DataFrame with columns of integers ( int ) and floating point numbers ( float ).

If you retrieve a row as a Series using loc or iloc , its data type becomes float . Elements in int columns are converted to float .

If you execute the following code, the element is returned as float .

You can get elements of the original type with at or iat .

When a row is selected using a list or slice with loc or iloc , a DataFrame is returned instead of a Series .

Related Categories

Set Pandas Conditional Column Based on Values of Another Column

August 9, 2021 February 22, 2022

Learn how to create a pandas conditional column cover image

There are many times when you may need to set a Pandas column value based on the condition of another column. In this post, you’ll learn all the different ways in which you can create Pandas conditional columns.

Table of Contents

Video Tutorial

If you prefer to follow along with a video tutorial, check out my video below:

Loading a Sample Dataframe

Let’s begin by loading a sample Pandas dataframe that we can use throughout this tutorial.

We’ll begin by import pandas and loading a dataframe using the .from_dict() method:

This returns the following dataframe:

Using Pandas loc to Set Pandas Conditional Column

Pandas loc is incredibly powerful! If you need a refresher on loc (or iloc), check out my tutorial here . Pandas’ loc creates a boolean mask, based on a condition. Sometimes, that condition can just be selecting rows and columns, but it can also be used to filter dataframes. These filtered dataframes can then have values applied to them.

Let’s explore the syntax a little bit:

With the syntax above, we filter the dataframe using .loc and then assign a value to any row in the column (or columns) where the condition is met.

Let’s try this out by assigning the string ‘Under 30’ to anyone with an age less than 30, and ‘Over 30’ to anyone 30 or older.

Let's take a look at what we did here:

We assigned the string 'Over 30' to every record in the dataframe. To learn more about this, check out my post here or creating new columns.
We then use .loc to create a boolean mask on the Age column to filter down to rows where the age is less than 30. When this condition is met, the Age Category column is assigned the new value 'Under 30'

But what happens when you have multiple conditions? You could, of course, use .loc multiple times, but this is difficult to read and fairly unpleasant to write. Let's see how we can accomplish this using numpy's .select() method.

Using Numpy Select to Set Values using Multiple Conditions

Similar to the method above to use .loc to create a conditional column in Pandas, we can use the numpy .select() method.

Let's begin by importing numpy and we'll give it the conventional alias np :

Now, say we wanted to apply a number of different age groups, as below:

<20 years old,
20-39 years old,
40-59 years old,
60+ years old

In order to do this, we'll create a list of conditions and corresponding values to fill:

Running this returns the following dataframe:

Let's break down what happens here:

We first define a list of conditions in which the criteria are specified. Recall that lists are ordered meaning that they should be in the order in which you would like the corresponding values to appear.
We then define a list of values to use , which corresponds to the values you'd like applied in your new column.

Something to consider here is that this can be a bit counterintuitive to write. You can similarly define a function to apply different values. We'll cover this off in the section of using the Pandas .apply() method below .

One of the key benefits is that using numpy as is very fast, especially when compared to using the .apply() method.

Using Pandas Map to Set Values in Another Column

The Pandas .map() method is very helpful when you're applying labels to another column. In order to use this method, you define a dictionary to apply to the column.

For our sample dataframe, let's imagine that we have offices in America, Canada, and France. We want to map the cities to their corresponding countries and apply and "Other" value for any other city.

When we print this out, we get the following dataframe returned:

What we can see here, is that there is a NaN value associated with any City that doesn't have a corresponding country. If we want to apply "Other" to any missing values, we can chain the .fillna() method:

Using Pandas Apply to Apply a function to a column

Finally, you can apply built-in or custom functions to a dataframe using the Pandas .apply() method.

Let's take a look at both applying built-in functions such as len() and even applying custom functions.

Applying Python Built-in Functions to a Column

We can easily apply a built-in function using the .apply() method. Let's see how we can use the len() function to count how long a string of a given column.

Take note of a few things here:

We apply the .apply() method to a particular column,
We omit the parentheses "()"

Using Third-Party Packages in Pandas Apply

Similarly, you can use functions from using packages. Let's use numpy to apply the .sqrt() method to find the scare root of a person's age.

Using Custom Functions with Pandas Apply

Something that makes the .apply() method extremely powerful is the ability to define and apply your own functions.

Let's revisit how we could use an if-else statement to create age categories as in our earlier example:

In this post, you learned a number of ways in which you can apply values to a dataframe column to create a Pandas conditional column, including using .loc , .np.select() , Pandas .map() and Pandas .apply() . Each of these methods has a different use case that we explored throughout this post.

Learn more about Pandas methods covered here by checking out their official documentation:

Pandas Apply
Numpy Select

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials. View Author posts