The convolution operation is a cornerstone in many fields such as signal processing, image processing, and machine learning. It involves the integration of two functions, showing how the shape of one is modified by the other. The Fast Fourier Transform (FFT) provides an efficient way to compute convolutions, especially when dealing with large datasets or real-time processing, where traditional convolution becomes computationally expensive.
Mathematically, convolution is defined for two functions and as:
In discrete terms, for sequences and of length , the convolution is given by:
where is evaluated modulo in the case of circular convolution.
The Fourier Transform (FT) of a continuous signal is defined as:
For discrete signals, the Discrete Fourier Transform (DFT) is used:
The DFT transforms a signal from the time domain into the frequency domain.
The Convolution Theorem states that the Fourier Transform of the convolution of two signals is the pointwise product of their Fourier Transforms:
This theorem forms the basis for using FFT to compute convolutions efficiently.
Given two sequences and , their FFTs are computed as follows:
After zero-padding and to the same length , their FFTs are:
The convolution result in the frequency domain is:
Applying the inverse FFT:
Tensor convolution is an extension of traditional convolution techniques, applied particularly in fields like computer vision and deep learning, where multidimensional data structures known as tensors are manipulated. The Fast Fourier Transform (FFT) accelerates tensor convolution significantly, enhancing the efficiency of operations such as those in convolutional neural networks (CNNs). Below, we delve into the theory, procedure, and specific considerations for implementing tensor convolution using FFT.
FFT can be extended to multidimensional data to speed up the convolution operation in multiple dimensions simultaneously. The key principle remains utilizing the Convolution Theorem, which states that convolution in the spatial domain corresponds to element-wise multiplication in the frequency domain. For tensors, this principle applies in each dimension independently.
Consider a 3D input tensor and a kernel , both extended to the same size with zero-padding to avoid circular convolution artifacts. Let their sizes after padding be .
The FFT-based convolution can be computed as follows:
FFT Computation: Compute the FFT of both tensors and across all three dimensions:
Element-wise Multiplication: Multiply the Fourier transforms and element-wise.
Inverse FFT: Apply the inverse FFT to obtain the convolved tensor in the spatial domain:
Here, represents the output tensor, which is the result of convolving with .
in_channels
: int
out_channels
: int
filter_size
: int
stride
: int
padding
: Literal['same', 'valid']
, default = ‘samactivation
: FuncType
, default = ‘relu’initializer
: InitType
, default = ‘auto’optimizer
: Optimizer
, default = Nonelambda_
: float
, default = 0.0from luma.preprocessing.scaler import StandardScaler
from luma.preprocessing.encoder import OneHotEncoder
from luma.model_selection.split import TrainTestSplit, BatchGenerator
from luma.metric.classification import Accuracy
from luma.neural.layer import Convolution, Flatten, Dense, Sequential
from luma.neural.loss import SoftmaxLoss
from luma.neural.optimizer import AdamOptimizer
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
import numpy as np
X, y = load_digits(return_X_y=True)
sc = StandardScaler()
X_sc = sc.fit_transform(X)
X_train, X_test, y_train, y_test = TrainTestSplit(
X_sc, y, test_size=0.3, shuffle=True, stratify=True
).get
en = OneHotEncoder()
y_train = en.fit_transform(y_train.reshape(-1, 1))
y_test = en.fit_transform(y_test.reshape(-1, 1))
X_train = X_train.reshape(-1, 1, 8, 8)
X_test = X_test.reshape(-1, 1, 8, 8)
epochs = 30
batch_size = 100
# -----------------------------------------------------------#
model = Sequential(
Convolution(1, 6, 5, padding="same", activation="relu"),
Convolution(6, 12, 5, padding="same", activation="relu"),
Flatten(),
Dense(768, 10, activation=None),
)
model.set_optimizer(AdamOptimizer(), learning_rate=0.001)
model.set_loss(SoftmaxLoss())
print(repr(model))
# -----------------------------------------------------------#
losses = []
for i in range(1, epochs + 1):
epoch_loss = []
for j, (X_batch, y_batch) in enumerate(
BatchGenerator(X_train, y_train, batch_size=batch_size), start=1
):
loss = model(X_batch, y_batch, is_train=True)
losses.append(loss)
epoch_loss.append(loss)
print(f"Epoch {i}/{epochs} - Avg. Loss: {np.average(epoch_loss)}")
test_out = model.forward(X_test)
score = Accuracy.score(y_test.argmax(axis=1), test_out.argmax(axis=1))
r_id = np.random.choice(range(len(X_test)), 1)[0]
X_sample = X_test[r_id][np.newaxis]
first_out = model[0][1].forward(X_sample)
second_out = model[1][1].forward(first_out)
outs = np.vstack((first_out.squeeze(), second_out.squeeze()))
plt.figure(figsize=(10, 5))
plt.suptitle(
f"Convolution Result for Digit '{y_test[r_id].argmax()}' "
+ f"[Acc: {score * 100:.2f} %]"
)
rows, cols = 3, 6
for i in range(1, 19):
ax = plt.subplot(rows, cols, i)
ax.imshow(
outs[i - 1],
cmap="Blues",
interpolation="bilinear",
)
ax.set_xticks([])
ax.set_yticks([])
ax.set_title(f"Filter {i}")
plt.tight_layout()
plt.show()
The FFT-based convolution is widely used in:
Convolution via FFT is a powerful technique that leverages the efficiency of the Fourier Transform to perform convolutions in an optimized manner. The understanding and application of this method across various domains illustrate its fundamental importance in modern computational techniques.
- Smith, Steven W. "The Scientist and Engineer's Guide to Digital Signal Processing." California Technical Publishing, 1997.
- Brigham, E. Oran. "The Fast Fourier Transform and Its Applications." Prentice Hall, 1988.