Series vs. DataFrame, iloc vs. loc

been_29·2024년 7월 29일

python

한국경제신문 with Toss bank MLOps 과정

목록 보기

3/26

💡 The difference between Series and DataFrame

Series

Definition : A one-dimensional array-like object that holds a sequence of data and associated labels, called indices.
Features
- 1D Data : Can hold data of any type (integer, float, string, etc.).
- Index : Each element in a ‘Series’ has a unique index, which can be a default integer indes or a custom index.
- Homogeneous Data : All elements in a ‘Series’ are typically of the same data type.

Example code

```python
import pandas as pd

data = [1, 3, 5, 7, 9]
series = pd.Series(data)
print(series)
```

```python
#Output
0    1
1    3
2    5
3    7
4    9
dtype: int64
```

DataFrame

Definition: A two-dimensional, tabular data structure with labeled axes (rows and columns). It can be thought of as a collection of ‘series’ objects.
Features
- 2D Data : It can hold data of different type (numeric, string, boolean, etc.) in columns.
- Index and Columns : It has both row indices and column labels, making it very flexible for data manipulation.
- Heterogeneous Data : Each columns in a ‘DataFrame’ can contain data of different types.

Example code

import pandas as pd

data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}
df = pd.DataFrame(data)
print(df)

Differences between Series and DataFrame

	Dimensionality	Structure	Data Type	Usage
Series	One-dimensional	A single sequence of values, similar to a list of an array	Typically homogeneous (all elements are of the same type)	Useful for storing and manipulating columns and for operations involving multiple variables
DataFrame	Two-dimensional	A table with multiple columns, each of which can be considered a ‘Series’	Heterogeneous (different columns can have different types)	Ideal for datasets with multiple columns and for operations involving multiple variables

💡 The difference between iloc and loc

iloc

Definition: ‘iloc’ stands for ‘integer location’ and is used for indexing by position. It allows you to select data by its integer position.
Characteristics
- Integer-Based Indexing : Use integer indices to select rows and columns. Useful when you want to access data by its position in the DataFrame.
- Python-Like Slicing : Follow Python’s slicing rules where the start index is inclusive and the end index is exclusive.
- Positional Access : Ideal for accessing data when you know the exact position (row/column number) of the data.
- Out-of-Bounds Handling : Raise and ‘IndexError’ if you attempt to access a position that doesn’t exist in the DataFrame.
- Supports Integer Arrays : Can use lists or arrays of integers to select specific rows and columns.
Usage : Access rows and columns using Integer indices.

Example code

import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Select the first row
print(df.iloc[0])

# Select the first row and first column
print(df.iloc[0, 0])

# Select the first two rows
print(df.iloc[:2])

# Select the first two rows and first two columns
print(df.iloc[:2, :2])

loc

Definition: ‘loc’ stands for “label location” and is used for indexing by labels or boolean arrays. It allows you to select data by the labels of rows and columns.
Characteristics
- Label-Based Indexing : ‘loc’ uses labels (indices and column names) to select rows and columns. Useful when your DataFrame has meaningful index labels.
- Inclusive Slicing : Include both the start and end labels in the slice.
- Label Access : Ideal for accessing data when you know the labels of the data.
- Flexible Indexing : Can handle more complex data retrieval scenarios, such as using boolean arrays, lists of labels, or slices with labels.
- Error Handling : Raises a ‘KeyError’ if the specified label does not exist in the DataFrame.
- Supports Boolean Indexing : Can use boolean arrays to filter rows or columns based of conditions.
Usage: Access rows and columns using labels.

Example code

import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data, index=['one', 'two', 'three'])

# Select the row with label 'one'
print(df.loc['one'])

# Select the row with label 'one' and column 'A'
print(df.loc['one', 'A'])

# Select rows with labels 'one' and 'two'
print(df.loc[['one', 'two']])

# Select rows 'one' and 'two' and columns 'A' and 'B'
print(df.loc[['one', 'two'], ['A', 'B']])

Differences between ‘iloc’ and ‘loc’

Feature	‘iloc’	‘loc’
Indexing Method	Integer-Based	Label-based
Slicing Behavior	Start inclusive, end exclusive	Both start and end inclusive
Error Handling	‘IndexError’ for out-of-bounds	‘KeyError’ for missing labels
Access Method	Positional	Label
Usage	When position is known	When label is known
Supports Boolean Arrays	No	Yes
Supports Integer Arrays	Yes	No
Flexibility	Less flexible	More flexible

been_29

Data Analysis

이전 포스트

Docstrings

다음 포스트

Series vs. DataFrame, iloc vs. loc

한국경제신문 with Toss bank MLOps 과정

💡 The difference between Series and DataFrame

Series

DataFrame

Differences between Series and DataFrame

💡 The difference between iloc and loc

iloc

loc

Differences between ‘iloc’ and ‘loc’

Docstrings

Lambda Function vs. UDF

0개의 댓글