2  Transforming Linear Algebra to Computational Language

2.1 Introduction

In the first module, we established a solid foundation in matrix algebra by exploring pseudocode and implementing fundamental matrix operations using Python. We practiced key concepts such as matrix addition, subtraction, multiplication, and determinants through practical examples in image processing, leveraging the SymPy library for symbolic computation.

As we begin the second module, “Transforming Linear Algebra to Computational Language,” our focus will shift towards applying these concepts with greater depth and actionable insight. This module is designed to bridge the theoretical knowledge from matrix algebra with practical computational applications. You will learn to interpret and utilize matrix operations, solve systems of equations, and analyze the rank of matrices within a variety of real-world contexts.

A new concept we will introduce is the Rank-Nullity Theorem, which provides a fundamental relationship between the rank of a matrix and the dimensions of its null space. This theorem is crucial for understanding the solution spaces of linear systems and the properties of linear transformations. By applying this theorem, you will be able to gain deeper insights into the structure of solutions and the behavior of matrix transformations.

This transition will not only reinforce your understanding of linear algebra but also enhance your ability to apply these concepts effectively in computational settings. Through engaging examples and practical exercises, you will gain valuable experience in transforming abstract mathematical principles into tangible solutions, setting a strong groundwork for advanced computational techniques.

2.2 Relearning of Terms and Operations in Linear Algebra

In this section, we will revisit fundamental matrix operations such as addition, subtraction, scaling, and more through practical examples. Our goal is to transform theoretical linear algebra into modern computational applications. We will demonstrate these concepts using Python, focusing on practical and industrial applications.

2.2.1 Matrix Addition and Subtraction in Data Analysis

Matrix addition and subtraction are fundamental operations that help in combining datasets and analyzing differences.

Simple Example: Combining Quarterly Sales Data

We begin with quarterly sales data from different regions and combine them to get the total sales. The sales data is given in Table 2.1. A ar plot of the total sales is shown in Fig 2.1.

Table 2.1: Quarterly Sales Data
Region Q1 Q2 Q3 Q4
A 2500 2800 3100 2900
B 1500 1600 1700 1800

From Scratch Python Implementation:

import numpy as np
import matplotlib.pyplot as plt

# Quarterly sales data
sales_region_a = np.array([2500, 2800, 3100, 2900])
sales_region_b = np.array([1500, 1600, 1700, 1800])

# Combine sales data
total_sales = sales_region_a + sales_region_b

# Visualization
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
plt.bar(quarters, total_sales, color='skyblue')
plt.xlabel('Quarter')
plt.ylabel('Total Sales')
plt.title('Combined Quarterly Sales Data for Regions A and B')
plt.show()
Figure 2.1: Computing Total Sales using Numpy aggregation method

In the above Python code, we have performed the aggregation operation with the NumPy method. Same can be done in a more data analysis style using pandas inorder to handle tabular data meaningfully. In this approach, quarterly sales data of each region is stored as DataFrames(like an excel sheet). The we combine these two DataFrames into one. After that create a new row with index ‘Total’ and populate this row with sum of quarterly sales in Region A and Region B. Finally a bar plot is created using this ‘Total’ sales. Advantage of this approach is that we don’t need the matplotlib library to create visualizations!. The EDA using this approach is shown in Fig 2.2.

import pandas as pd
import matplotlib.pyplot as plt

# DataFrames for quarterly sales data
df_a = pd.DataFrame({'Q1': [2500], 'Q2': [2800], 'Q3': [3100], 'Q4': [2900]}, index=['Region A'])
df_b = pd.DataFrame({'Q1': [1500], 'Q2': [1600], 'Q3': [1700], 'Q4': [1800]}, index=['Region B'])

# Combine data
df_combined = df_a.add(df_b, fill_value=0)
df_combined.loc["Total"] = df_combined.sum(axis=0)
# Visualization
df_combined.loc["Total"].plot(kind='bar', color=['green'])
plt.xlabel('Quarter')
plt.ylabel('Total Sales')
plt.title('Combined Quarterly Sales Data for Regions A and B')
plt.show()
Figure 2.2: Computation of Total Sales using Pandas method

We can extend this in to more advanced examples. Irrespective to the size of the data, for representation and aggregation tasks matrix models are best options and are used in industry as a standard. Let us consider an advanced example to analyse difference in stock prices. For this example we are using a simulated data. The python code for this simulation process is shown in Fig 2.3.

import numpy as np
import matplotlib.pyplot as plt

# Simulated observed and predicted stock prices
observed_prices = np.random.uniform(100, 200, size=(100, 5))
predicted_prices = np.random.uniform(95, 210, size=(100, 5))

# Calculate the difference matrix
price_differences = observed_prices - predicted_prices

# Visualization
plt.imshow(price_differences, cmap='coolwarm', aspect='auto')
plt.colorbar()
plt.title('Stock Price Differences')
plt.xlabel('Stock Index')
plt.ylabel('Day Index')
plt.show()
Figure 2.3: Demonstration of Stock Price simulated from a Uniform Distribution

Another important matrix operation relevant to data analytics and Machine Learning application is scaling. This is considered as a statistical tool to make various features (attributes) in to same scale so as to avoid unnecessary misleading impact in data analysis and its intepretation. In Machine Learning context, this pre-processing stage is inevitable so as to make the model relevant and usable.

Simple Example: Normalizing Employee Performance Data

Table 2.2: Employee Performance Data
Employee Metric A Metric B
X 80 700
Y 90 800
Z 100 900
A 110 1000
B 120 1100

Using simple python code we can simulate the model for min-max scaling. The formula for min-max scaling is: \[min_max(X)=\dfrac{X-min(X)}{max(X)-min(X)}\]

For example, while applying the min-max scaling in the first value of Metric A, the scaled value is \[min_max(80)\dfrac{80-80}{120-80}=0\]

Similarly

\[min_max(100)\dfrac{100-80}{120-80}=0.5\]

When we apply this formula to Metric A and Metric B, the scaled output from Table 2.2 will be as follows:

Table 2.3: Employee Performance Data
Employee Metric A Metric B
X 0.00 0.00
Y 0.25 0.25
Z 0.50 0.50
A 0.75 0.75
B 1.00 1.00

It is interesting to look into the scaled data! In the orginal table (Table 2.2) it is looked like Metric B is superior. But from the scaled table (Table 2.3), it is clear that both the Metrics are representing same relative information. This will help us to identify the redundency in measure and so skip any one of the Metric before analysis!.

The same can be achieved through a matrix operation. The Python implementation of this scaling process is shown in Fig 2.4.

import numpy as np
import matplotlib.pyplot as plt

# Employee performance data with varying scales
data = np.array([[80, 700], [90, 800], [100, 900], [110, 1000], [120, 1100]])

# Manual scaling
min_vals = np.min(data, axis=0)
max_vals = np.max(data, axis=0)
scaled_data = (data - min_vals) / (max_vals - min_vals)

# Visualization
plt.figure(figsize=(8, 5))
plt.subplot(1, 2, 1)
plt.imshow(data, cmap='viridis')
plt.title('Original Data')
plt.colorbar()

plt.subplot(1, 2, 2)
plt.imshow(scaled_data, cmap='viridis')
plt.title('Scaled Data')
plt.colorbar()

plt.show()
Figure 2.4: Total sales using pandas method

From the first sub plot, it is clear that there is a significant difference in the distributions (Metric A and Metric B values). But the second sub plot shows that both the distributions have same pattern and the values ranges between 0 and 1. In short the visualization is more appealing and self explanatory in this case.

Note

The min-max scaling method will confine the feature values (attributes) into the range \([0,1]\). So in effect all the features are scaled proportionally to the data spectrum.

Similarly, we can use the standard scaling (transformation to normal distribution) using the transformation \(\dfrac{x-\bar{x}}{\sigma}\). Scaling table is given as a practice task to the reader. The python code for this operation is shown in Fig 2.5.

# Standard scaling from scratch
def standard_scaling(data):
    mean = np.mean(data, axis=0)
    std = np.std(data, axis=0)
    scaled_data = (data - mean) / std
    return scaled_data

# Apply standard scaling
scaled_data_scratch = standard_scaling(data)

print("Standard Scaled Data (from scratch):\n", scaled_data_scratch)

# Visualization
plt.figure(figsize=(6, 5))
plt.subplot(1, 2, 1)
plt.imshow(data, cmap='viridis')
plt.title('Original Data')
plt.colorbar()

plt.subplot(1, 2, 2)
plt.imshow(scaled_data_scratch, cmap='viridis')
plt.title('Scaled Data')
plt.colorbar()

plt.show()
Standard Scaled Data (from scratch):
 [[-1.41421356 -1.41421356]
 [-0.70710678 -0.70710678]
 [ 0.          0.        ]
 [ 0.70710678  0.70710678]
 [ 1.41421356  1.41421356]]
Figure 2.5: Min-max scaling using basic python

To understand the effect of standard scaling, let us consider Fig 2.6. This plot create the frequency distribution of the data as a histogram along with the density function. From the first sub-plot, it is clear that the distribution has multiple modes (peaks). When we apply the standard scaling, the distribution become un-modal(only one peek). This is demonstrated in the second sub-plot.

# Standard scaling from scratch
import seaborn as sns
# Create plots
plt.figure(figsize=(6, 5))

# Plot for original data
plt.subplot(1, 2, 1)
sns.histplot(data, kde=True, bins=10, palette="viridis")
plt.title('Original Data Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Plot for standard scaled data
plt.subplot(1, 2, 2)
sns.histplot(scaled_data_scratch, kde=True, bins=10, palette="viridis")
plt.title('Standard Scaled Data Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')

plt.tight_layout()
plt.show()
Figure 2.6: Impact of standard scaling on the distribution

A scatter plot showing the compare the impact of scaling on the given distribution is shown in Fig 2.7.

# Plot original and scaled data
plt.figure(figsize=(6, 5))

# Original Data
plt.subplot(1, 3, 1)
plt.scatter(data[:, 0], data[:, 1], color='blue')
plt.title('Original Data')
plt.xlabel('Metric A')
plt.ylabel('Metric B')

# Standard Scaled Data
plt.subplot(1, 3, 2)
plt.scatter(scaled_data_scratch[:, 0], scaled_data_scratch[:, 1], color='green')
plt.title('Standard Scaled Data')
plt.xlabel('Metric A (Standard Scaled)')
plt.ylabel('Metric B (Standard Scaled)')

# Min-Max Scaled Data
plt.subplot(1, 3, 3)
plt.scatter(scaled_data[:, 0], scaled_data[:, 1], color='red')
plt.title('Min-Max Scaled Data')
plt.xlabel('Metric A (Min-Max Scaled)')
plt.ylabel('Metric B (Min-Max Scaled)')

plt.tight_layout()
plt.show()
Figure 2.7: Comparison of impact of scaling on the distribution

From the Fig 2.7, it is clear that the scaling does not affect the pattern of the data, instead it just scale the distribution proportionally!

We can use the scikit-learn library for do the same thing in a very simple handy approach. The python code for this job is shown below.

from sklearn.preprocessing import MinMaxScaler

# Min-max scaling using sklearn
scaler = MinMaxScaler()
min_max_scaled_data_sklearn = scaler.fit_transform(data)

print("Min-Max Scaled Data (using sklearn):\n", min_max_scaled_data_sklearn)
Min-Max Scaled Data (using sklearn):
 [[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [0.75 0.75]
 [1.   1.  ]]
from sklearn.preprocessing import StandardScaler

# Standard scaling using sklearn
scaler = StandardScaler()
scaled_data_sklearn = scaler.fit_transform(data)

print("Standard Scaled Data (using sklearn):\n", scaled_data_sklearn)
Standard Scaled Data (using sklearn):
 [[-1.41421356 -1.41421356]
 [-0.70710678 -0.70710678]
 [ 0.          0.        ]
 [ 0.70710678  0.70710678]
 [ 1.41421356  1.41421356]]

A scatter plot showing the impact on scaling is shown in Fig 2.8. This plot compare the mmin-max and standard-scaling.

# Plot original and scaled data
plt.figure(figsize=(6, 5))

# Original Data
plt.subplot(1, 3, 1)
plt.scatter(data[:, 0], data[:, 1], color='blue')
plt.title('Original Data')
plt.xlabel('Metric A')
plt.ylabel('Metric B')

# Standard Scaled Data
plt.subplot(1, 3, 2)
plt.scatter(scaled_data_sklearn[:, 0], scaled_data_sklearn[:, 1], color='green')
plt.title('Standard Scaled Data')
plt.xlabel('Metric A (Standard Scaled)')
plt.ylabel('Metric B (Standard Scaled)')

# Min-Max Scaled Data
plt.subplot(1, 3, 3)
plt.scatter(min_max_scaled_data_sklearn[:, 0], min_max_scaled_data_sklearn[:, 1], color='red')
plt.title('Min-Max Scaled Data')
plt.xlabel('Metric A (Min-Max Scaled)')
plt.ylabel('Metric B (Min-Max Scaled)')

plt.tight_layout()
plt.show()
Figure 2.8: Camparison of Min-max and standard scalings with original data

2.2.2 More on Matrix Product and its Applications

In the first module of our course, we introduced matrix products as scalar projections, focusing on how matrices interact through basic operations. In this section, we will expand on this by exploring different types of matrix products that have practical importance in various fields. One such product is the Hadamard product, which is particularly useful in applications ranging from image processing to neural networks and statistical analysis. We will cover the definition, properties, and examples of the Hadamard product, and then delve into practical applications with simulated data.

2.2.2.1 Hadamard Product

The Hadamard product (or element-wise product) of two matrices is a binary operation that combines two matrices of the same dimensions to produce another matrix of the same dimensions, where each element is the product of corresponding elements in the original matrices.

Definition (Hadamard Product):

For two matrices \(A\) and \(B\) of the same dimension \(m \times n\), the Hadamard product \(A \circ B\) is defined as:

\[(A \circ B)_{ij} = A_{ij} \cdot B_{ij}\]

where \(\cdot\) denotes element-wise multiplication.

Properties of Hadamard Product
  1. Commutativity: \[A \circ B = B \circ A\]

  2. Associativity: \[(A \circ B) \circ C = A \circ (B \circ C)\]

  3. Distributivity: \[A \circ (B + C) = (A \circ B) + (A \circ C)\]

Some simple examples to demonstrate the Hadamard product is given below.

Example 1: Basic Hadamard Product

Given matrices:

\[A = \begin{pmatrix}1 & 2 \\3 & 4\end{pmatrix}, \quad B = \begin{pmatrix}5 & 6 \\7 & 8\end{pmatrix}\]

The Hadamard product \(A \circ B\) is:

\[A \circ B = \begin{pmatrix}1 \cdot 5 & 2 \cdot 6 \\3 \cdot 7 & 4 \cdot 8\end{pmatrix} = \begin{pmatrix}5 & 12 \\21 & 32\end{pmatrix}\]

Example 2: Hadamard Product with Larger Matrices

Given matrices:

\[A = \begin{pmatrix}1 & 2 & 3 \\4 & 5 & 6 \\7 & 8 & 9\end{pmatrix}, \quad B = \begin{pmatrix}9 & 8 & 7 \\6 & 5 & 4 \\3 & 2 & 1\end{pmatrix}\]

The Hadamard product \(A \circ B\) is:

\[A \circ B = \begin{pmatrix}1 \cdot 9 & 2 \cdot 8 & 3 \cdot 7 \\4 \cdot 6 & 5 \cdot 5 & 6 \cdot 4 \\7 \cdot 3 & 8 \cdot & 9 \cdot 1\end{pmatrix} = \begin{pmatrix}9 & 16 & 21 \\24 & 25 & 24 \\21 & 16 & 9\end{pmatrix}\]

In the following code chunks the computational process of Hadamard product is implemented in Python. Here both the from the scratch and use of external module versions are included.

1. Compute Hadamard Product from Scratch (without Libraries)

Here’s how you can compute the Hadamard product manually:

# Define matrices A and B
A = [[1, 2, 3], [4, 5, 6]]
B = [[7, 8, 9], [10, 11, 12]]

# Function to compute Hadamard product
def hadamard_product(A, B):
    # Get the number of rows and columns
    num_rows = len(A)
    num_cols = len(A[0])
    
    # Initialize the result matrix
    result = [[0]*num_cols for _ in range(num_rows)]
    
    # Compute the Hadamard product
    for i in range(num_rows):
        for j in range(num_cols):
            result[i][j] = A[i][j] * B[i][j]
    
    return result

# Compute Hadamard product
hadamard_product_result = hadamard_product(A, B)

# Display result
print("Hadamard Product (From Scratch):")
for row in hadamard_product_result:
    print(row)
Hadamard Product (From Scratch):
[7, 16, 27]
[40, 55, 72]

2. Compute Hadamard Product Using SymPy

Here’s how to compute the Hadamard product using SymPy:

import sympy as sp

# Define matrices A and B
A = sp.Matrix([[1, 2, 3], [4, 5, 6]])
B = sp.Matrix([[7, 8, 9], [10, 11, 12]])

# Compute Hadamard product using SymPy
Hadamard_product_sympy = A.multiply_elementwise(B)

# Display result
print("Hadamard Product (Using SymPy):")
print(Hadamard_product_sympy)
Hadamard Product (Using SymPy):
Matrix([[7, 16, 27], [40, 55, 72]])

Practical Applications

Application 1: Image Masking

The Hadamard product can be used for image masking. Here’s how you can apply a mask to an image and visualize it as shown in Fig 2.9.

import matplotlib.pyplot as plt
import numpy as np

# Simulated large image (2D array) using NumPy
image = np.random.rand(100, 100)

# Simulated mask (binary matrix) using NumPy
mask = np.random.randint(0, 2, size=(100, 100))

# Compute Hadamard product
masked_image = image * mask

# Plot original image and masked image
fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].imshow(image, cmap='gray')
ax[0].set_title('Original Image')
ax[1].imshow(masked_image, cmap='gray')
ax[1].set_title('Masked Image')
plt.show()
Figure 2.9: Demonstration of Masking in DIP using Hadamard Product

Application 2: Element-wise Scaling in Neural Networks

The Hadamard product can be used for dropout1 in neural networks. A simple simulated example is given below.

# Simulated large activations (2D array) using NumPy
activations = np.random.rand(100, 100)

# Simulated dropout mask (binary matrix) using NumPy
dropout_mask = np.random.randint(0, 2, size=(100, 100))

# Apply dropout
dropped_activations = activations * dropout_mask

# Display results
print("Original Activations:")
print(activations)
print("\nDropout Mask:")
print(dropout_mask)
print("\nDropped Activations:")
print(dropped_activations)
Original Activations:
[[0.06894617 0.3776627  0.33440532 ... 0.11017061 0.3622906  0.7539885 ]
 [0.50669932 0.62064754 0.36106756 ... 0.36415903 0.59856668 0.67680069]
 [0.22072699 0.90728473 0.43881544 ... 0.88782933 0.00681409 0.47232089]
 ...
 [0.20696451 0.44259024 0.93360471 ... 0.33349169 0.24542978 0.19109216]
 [0.90511144 0.70569249 0.36479526 ... 0.71831819 0.77584359 0.583544  ]
 [0.82030691 0.2622308  0.57648967 ... 0.99234794 0.74023406 0.50952928]]

Dropout Mask:
[[1 0 0 ... 0 0 1]
 [1 1 1 ... 1 0 0]
 [0 1 1 ... 0 0 0]
 ...
 [1 0 0 ... 1 1 1]
 [1 0 0 ... 0 1 1]
 [0 0 0 ... 0 1 1]]

Dropped Activations:
[[0.06894617 0.         0.         ... 0.         0.         0.7539885 ]
 [0.50669932 0.62064754 0.36106756 ... 0.36415903 0.         0.        ]
 [0.         0.90728473 0.43881544 ... 0.         0.         0.        ]
 ...
 [0.20696451 0.         0.         ... 0.33349169 0.24542978 0.19109216]
 [0.90511144 0.         0.         ... 0.         0.77584359 0.583544  ]
 [0.         0.         0.         ... 0.         0.74023406 0.50952928]]

Application 3: Statistical Data Analysis

In statistics, the Hadamard product can be applied to scale covariance matrices. Here’s how we can compute the covariance matrix using matrix operations and apply scaling. Following Python code demonstrate this.

import sympy as sp
import numpy as np

# Simulated large dataset (2D array) using NumPy
data = np.random.rand(100, 10)

# Compute the mean of each column
mean = np.mean(data, axis=0)

# Center the data
centered_data = data - mean

# Compute the covariance matrix using matrix product operation
cov_matrix = (centered_data.T @ centered_data) / (centered_data.shape[0] - 1)
cov_matrix_sympy = sp.Matrix(cov_matrix)

# Simulated scaling factors (2D array) using SymPy Matrix
scaling_factors = sp.Matrix(np.random.rand(10, 10))

# Compute Hadamard product
scaled_cov_matrix = cov_matrix_sympy.multiply(scaling_factors)

# Display results
print("Covariance Matrix:")
print(cov_matrix_sympy)
print("\nScaling Factors:")
print(scaling_factors)
print("\nScaled Covariance Matrix:")
print(scaled_cov_matrix)
Covariance Matrix:
Matrix([[0.0917767756549757, -0.00231846920468021, 0.00631956965832033, 0.0103141856142347, 0.00178980482540451, 0.00175382457868899, -0.0169394103738993, 0.00759030424650395, 0.00335462576027497, -0.00673329178528108], [-0.00231846920468021, 0.0780427684025539, -0.0124198364681770, -0.00115432720981541, -0.00755837047352264, 0.00490241023588365, 0.00162704389180416, 0.00193409540629282, -0.00741077245348115, -0.00749399034438090], [0.00631956965832033, -0.0124198364681770, 0.0939413031048832, 0.00436891983117282, 0.00330840248130638, -0.0126415595066759, -0.00266414167487015, 0.00102485322040471, -0.0106289812153509, 0.0104244756319195], [0.0103141856142347, -0.00115432720981541, 0.00436891983117282, 0.0824590379149929, 0.000573918867501743, 0.00957165609934629, -0.00604000123116032, 0.00921663143350825, -0.00367487312182470, 0.00122122964324310], [0.00178980482540451, -0.00755837047352264, 0.00330840248130638, 0.000573918867501743, 0.0837022188162850, 0.00820263257003039, 0.000320865993343124, 0.0177849506610562, -0.00326860347774260, 0.00299231582670636], [0.00175382457868899, 0.00490241023588365, -0.0126415595066759, 0.00957165609934629, 0.00820263257003039, 0.0628257351791684, 0.000874518541289068, 0.00287804725796515, -0.00716117157943885, 0.000394534496818281], [-0.0169394103738993, 0.00162704389180416, -0.00266414167487015, -0.00604000123116032, 0.000320865993343124, 0.000874518541289068, 0.0890248830407284, -0.00330441759213419, 0.00885446559415291, 0.00676958530960561], [0.00759030424650395, 0.00193409540629282, 0.00102485322040471, 0.00921663143350825, 0.0177849506610562, 0.00287804725796515, -0.00330441759213419, 0.0844935599954661, 0.0100738268450946, -0.00906980778727026], [0.00335462576027497, -0.00741077245348115, -0.0106289812153509, -0.00367487312182470, -0.00326860347774260, -0.00716117157943885, 0.00885446559415291, 0.0100738268450946, 0.0754918952259983, -0.0111666568898086], [-0.00673329178528108, -0.00749399034438090, 0.0104244756319195, 0.00122122964324310, 0.00299231582670636, 0.000394534496818281, 0.00676958530960561, -0.00906980778727026, -0.0111666568898086, 0.0875581874238437]])

Scaling Factors:
Matrix([[0.815426278192760, 0.134707914881823, 0.492966917057263, 0.526969458056170, 0.574709310982182, 0.352843255964215, 0.887185538567414, 0.140539544175239, 0.269651721249414, 0.913828005977225], [0.878249897442327, 0.424152682141583, 0.0209393256230820, 0.154139786173812, 0.135654984788258, 0.302018168798723, 0.288731064565096, 0.385354792131266, 0.185641987803183, 0.123777425594904], [0.371153863577965, 0.108219056259665, 0.544093107049168, 0.573962112090211, 0.506379827589065, 0.349658022688059, 0.436759122564185, 0.254120141497413, 0.826337415383872, 0.481451164676728], [0.426152351895796, 0.160936011893139, 0.726475848279090, 0.318398838239760, 0.0869723758761120, 0.669839836406523, 0.410837107230599, 0.877671621998718, 0.634168166978639, 0.274509754325122], [0.453042028776812, 0.327644666684450, 0.571249402049481, 0.624669243676812, 0.832377451182500, 0.0468726218003922, 0.339836882411105, 0.566808438504695, 0.633708492326687, 0.0291543515123359], [0.451736623236072, 0.824457809593277, 0.0468262130838721, 0.414422830791189, 0.665847708940026, 0.150434046078858, 0.205694572941020, 0.442249051575973, 0.952011532573108, 0.832626802888198], [0.653128026260542, 0.160861376206219, 0.565877809471253, 0.214072854043979, 0.0167121306178294, 0.136029674702122, 0.101299626341632, 0.634929678466512, 0.886586287684352, 0.413033646466836], [0.765043992151462, 0.878200823093136, 0.661559847620237, 0.487275955471030, 0.810634716317772, 0.691067279086228, 0.233034346203150, 0.243685028223400, 0.968754188515623, 0.277710612576935], [0.543450644383474, 0.136833643173171, 0.839712926132745, 0.666874558212578, 0.197772746922458, 0.804484447939160, 0.429887114495949, 0.866946178251795, 0.967166730531108, 0.378003417138431], [0.615430489208242, 0.774849940106600, 0.566429331630759, 0.256121121628592, 0.438750969942939, 0.486059612700822, 0.0191618473801575, 0.790279808976955, 0.430604564330450, 0.942181887452992]])

Scaled Covariance Matrix:
Matrix([[0.0735675834825648, 0.0149385187901641, 0.0516691506364356, 0.0573471497773849, 0.0627642664152243, 0.0435159764037236, 0.0900861215506575, 0.0131347392451777, 0.0315642474043691, 0.0810030204681982], [0.0542421786651330, 0.0279647599529843, -0.0194605667032628, -0.00494881038120091, -0.00332087114031559, 0.00997311580445879, 0.0103036633559031, 0.0126202839006710, -0.00433305134650792, -0.00354650575765321], [0.0264453617408100, 0.00420876431073869, 0.0545898315104741, 0.0490641405603461, 0.0474914292929121, 0.0293678673621165, 0.0389687302987745, 0.0176751696094875, 0.0626976046500674, 0.0452241116805442], [0.0506029058391468, 0.0302885624603531, 0.0684033878009331, 0.0394051575658211, 0.0291854480436259, 0.0647043485552080, 0.0467461431273585, 0.0752354463648461, 0.0684913894778833, 0.0418344319333943], [0.0518009455509017, 0.0492146183130805, 0.0620376322953095, 0.0648668338893251, 0.0919509554637798, 0.0162064441614457, 0.0340478685702888, 0.0538227755352786, 0.0786700397081100, 0.0183757748163255], [0.0363435693860420, 0.0589667034107401, 0.00527928117940762, 0.0255471071360049, 0.0458685507822146, 0.0104649053072823, 0.0147813277282509, 0.0351179253014830, 0.0588230612787394, 0.0501255003850680], [0.0491885569563579, 0.0158497359551128, 0.0455311257524880, 0.0135211492700563, -0.00700906936453937, 0.00992381823754170, -0.00573045777848894, 0.0615823219224910, 0.0779441683404056, 0.0280962917012629], [0.0839293436645640, 0.0796584543161477, 0.0786793212754548, 0.0649825149540138, 0.0891164092330685, 0.0726979237431117, 0.0416745499907713, 0.0415726587915474, 0.107869373559443, 0.0304764049507348], [0.0336439567015974, 0.000540425736096700, 0.0595845458181088, 0.0426330754490364, 0.00607677949206663, 0.0550074490683428, 0.0275840912846151, 0.0513690478136324, 0.0653405037449039, 0.0144342366508952], [0.0391513406659179, 0.0579850703928723, 0.0428599800606040, 0.0157098697088934, 0.0322207232826521, 0.0282516350353530, -0.00653510828834694, 0.0633599327175958, 0.0325719569131053, 0.0772411854251056]])

2.2.2.2 Practice Problems

Problem 1: Basic Hadamard Product

Given matrices: \[A=\begin{bmatrix}1&2\\3&4\end{bmatrix}\] \[B=\begin{bmatrix}5&6\\7&8\end{bmatrix}\]

Find the Hadamard product \(C=A\circ B\).

Solution:

\[C=\begin{bmatrix}1\cdot 5&2\cdot 6\\3\cdot7&4\cdot 8 \end{bmatrix}=\begin{bmatrix}5&12\\21&32\end{bmatrix}\]

Problem 2: Hadamard Product with Identity Matrix

Given matrices: \[A=\begin{bmatrix}1&2&3\\4&5&6\end{bmatrix}\] \[I=\begin{bmatrix}1&0&0\\0&1&0\end{bmatrix}\]

Find the Hadamard product \(C=A\circ I\).

Solution:

\[C=\begin{bmatrix}1\cdot1&2\cdot 0&3\cdot 0\\4\cdot 0&5\cdot 1&6\cdot 0 \end{bmatrix}= \begin{bmatrix} 1&0&0\\0&5&0\end{bmatrix}\]

Problem 3: Hadamard Product with Zero Matrix

Given matrices: \[A=\begin{bmatrix}3&4\\5&6\end{bmatrix}\] \[Z=\begin{bmatrix}0&0\\0&0\end{bmatrix}\]

Find the Hadamard product \(C=A\circ Z\).

Solution:

\[C=\begin{bmatrix}3\cdot 0&4\cdot 0\\ 5\cdot 0&6\cdot 0 \end{bmatrix}=\begin{bmatrix}0&0\\0&0\end{bmatrix}\]

Problem 4: Hadamard Product of Two Identity Matrices

Given identity matrices: \[I_2=\begin{bmatrix}1&0\\0&1\end{bmatrix}\] \[I_3=\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}\]

Find the Hadamard product \(C=I_2\circ I_3\) (extend \(I_2\) to match dimensions of \(I_3\)).

Solution:

Extend \(I_2\) to \(I_3\): \[I_2=\begin{bmatrix}1&0&0\\0&1&0\\0&0&0\end{bmatrix}\]

\[C=\begin{bmatrix}1\cdot 1&0\cdot 0&0\cdot 0\\0\cdot 0&1\cdot 1&0\cdot 0\\0\cdot 0&0\cdot 0&0\cdot 1\end{bmatrix}=\begin{bmatrix}1&0&0\\0&1&0\\0&0&0\end{bmatrix}\]

Problem 5: Hadamard Product with Random Matrices

Given random matrices: \[A=\begin{bmatrix}2&3\\1&4\end{bmatrix}\] \[B=\begin{bmatrix}0&5\\6&2\end{bmatrix}\]

Find the Hadamard product \(C=A\circ B\).

Solution:

\[C=\begin{bmatrix}2\cdot 0&3\cdot 5\\1\cdot 6&4\cdot 2\end{bmatrix}=\begin{bmatrix}0&15\\6&8\end{bmatrix}\]

Problem 6: Hadamard Product of 3x3 Matrices

Given matrices: \[A=\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}\] \[B=\begin{bmatrix}9&8&7\\6&5&4\\3&2&1\end{bmatrix}\]

Find the Hadamard product \(C=A\circ B\).

Solution:

\[C=\begin{bmatrix}1\cdot 9&2\cdot 8&3\cdot 7\\4\cdot 6&5\cdot 5&6\cdot 4\\7\cdot 3&8\cdot 2&9\cdot 1\end{bmatrix}=\begin{bmatrix}9&16&21\\24&25&24\\21&16&9\end{bmatrix}\]

Problem 7: Hadamard Product of Column Vectors

Given column vectors: \[u=\begin{bmatrix}2\\3\end{bmatrix}\] \[v=\begin{bmatrix}5\\6\end{bmatrix}\]

Find the Hadamard product \(w=u\circ v\).

Solution:

\[w=\begin{bmatrix}2\cdot 5\\3\cdot 6\end{bmatrix}=\begin{bmatrix}10\\18\end{bmatrix}\]

Problem 8: Hadamard Product with Non-Square Matrices

Given matrices: \[A=\begin{bmatrix}1&2\\3&4\\5&6\end{bmatrix}\] \[B=\begin{bmatrix}7&8\\9&10\end{bmatrix}\]

Find the Hadamard product \(C=A\circ B\) (extend \(B\) to match dimensions of \(A\)).

Solution:

Extend \(B\) to match dimensions of \(A\): \[B=\begin{bmatrix}7&8\\9&10\\7&8\end{bmatrix}\]

\[C=\begin{bmatrix}1\cdot 7&2\cdot 8\\3\cdot 9&4\cdot 10\\5\cdot 7&6\cdot 8\end{bmatrix}=\begin{bmatrix}7&16\\27&40\\35&48\end{bmatrix}\]

Problem 9: Hadamard Product in Image Processing

Given matrices representing image pixel values: \[A=\begin{bmatrix}10&20\\30&40\end{bmatrix}\] \[B=\begin{bmatrix}0.5&1.5\\2.0&0.5\end{bmatrix}\]

Find the Hadamard product \(C=A\circ B\).

Solution:

\[C=\begin{bmatrix}10\cdot 0.5&20\cdot 1.5\\30\cdot 2.0&40\cdot 0.5\end{bmatrix}=\begin{bmatrix}5&30\\60&20\end{bmatrix}\]

Problem 10: Hadamard Product in Statistical Data

Given matrices representing two sets of statistical data:

\[A=\begin{bmatrix}5&6&7\\8&9&10\end{bmatrix}\] \[B=\begin{bmatrix}1&2&3\\4&5&6\end{bmatrix}\]

Find the Hadamard product \(C=A\circ B\).

Solution:

\[C=\begin{bmatrix}5\cdot 1&6\cdot 2&7\cdot 3\\8\cdot 4&9\cdot 5&10\cdot 6\end{bmatrix}=\begin{bmatrix}5&12&21\\32&45&60\end{bmatrix}\]

2.2.2.3 Inner Product of Matrices

The inner product of two matrices is a generalized extension of the dot product, where each matrix is treated as a vector in a high-dimensional space. For two matrices \(A\) and \(B\) of the same dimension \(m \times n\), the inner product is defined as the sum of the element-wise products of the matrices.

Definition (Inner product)

For two matrices \(A\) and \(B\) of dimension \(m \times n\), the inner product \(\langle A, B \rangle\) is given by:

\[\langle A, B \rangle = \sum_{i=1}^{m} \sum_{j=1}^{n} A_{ij} \cdot B_{ij}\]

where \(\cdot\) denotes element-wise multiplication.

Properties
  1. Commutativity: \[\langle A, B \rangle = \langle B, A \rangle\]

  2. Linearity: \[\langle A + C, B \rangle = \langle A, B \rangle + \langle C, B \rangle\]

  3. Positive Definiteness: \[\langle A, A \rangle \geq 0\] with equality if and only if \(A\) is a zero matrix.

Some simple examples showing the mathematical process of calculating the inner product is given bellow.

Example 1: Basic Inner Product

Given matrices:

\[A = \begin{pmatrix}1 & 2 \\3 & 4\end{pmatrix}, \quad B = \begin{pmatrix}5 & 6 \\7 & 8\end{pmatrix}\]

The inner product \(\langle A, B \rangle\) is:

\[\langle A, B \rangle = 1 \cdot 5 + 2 \cdot 6 + 3 \cdot 7 + 4 \cdot 8 = 5 + 12 + 21 + 32 = 70\]

Example 2: Inner Product with Larger Matrices

Given matrices:

\[A = \begin{pmatrix}1 & 2 & 3 \\4 & 5 & 6 \\7 & 8 & 9\end{pmatrix}, \quad B = \begin{pmatrix}9 & 8 & 7 \\6 & 5 & 4 \\3 & 2 & 1\end{pmatrix}\]

The inner product \(\langle A, B \rangle\) is calculated as: \[\begin{align*} \langle A, B \rangle &= 1 \cdot 9 + 2 \cdot 8 + 3 \cdot 7 + 4 \cdot 6 + 5 \cdot 5 + 6 \cdot 4 + 7 \cdot 3 + 8 \cdot 2 + 9 \cdot 1\\ &= 9 + 16 + 21 + 24 + 25 + 24 + 21 + 16 + 9\\ &= 175 \end{align*}\]

2.2.2.4 Practice Problems

Problem 1: Inner Product of 2x2 Matrices

Given matrices: \[A=\begin{bmatrix}1&2\\3&4\end{bmatrix}\] \[B=\begin{bmatrix}5&6\\7&8\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} A_{ij} B_{ij} \\ &= 1\cdot5 + 2\cdot6 + 3\cdot7 + 4\cdot8 \\ &= 5 + 12 + 21 + 32 \\ &= 70 \end{align*}\]


Problem 2: Inner Product of 3x3 Matrices

Given matrices: \[A=\begin{bmatrix}1&0&2\\3&4&5\\6&7&8\end{bmatrix}\] \[B=\begin{bmatrix}8&7&6\\5&4&3\\2&1&0\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} A_{ij} B_{ij} \\ &= 1\cdot8 + 0\cdot7 + 2\cdot6 + \\ &\quad 3\cdot5 + 4\cdot4 + 5\cdot3 + \\ &\quad 6\cdot2 + 7\cdot1 + 8\cdot0 \\ &= 8 + 0 + 12 + 15 + 16 + 15 + 12 + 7 + 0 \\ &= 85 \end{align*}\]


Problem 3: Inner Product of Diagonal Matrices

Given diagonal matrices: \[A=\begin{bmatrix}2&0&0\\0&3&0\\0&0&4\end{bmatrix}\] \[B=\begin{bmatrix}5&0&0\\0&6&0\\0&0&7\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} A_{ij} B_{ij} \\ &= 2\cdot5 + 0\cdot0 + 0\cdot0 + \\ &\quad 0\cdot0 + 3\cdot6 + 0\cdot0 + \\ &\quad 0\cdot0 + 0\cdot0 + 4\cdot7 \\ &= 10 + 0 + 0 + 0 + 18 + 0 + 0 + 0 + 28 \\ &= 56 \end{align*}\]


Problem 4: Inner Product of Column Vectors

Given column vectors: \[u=\begin{bmatrix}1\\2\\3\end{bmatrix}\] \[v=\begin{bmatrix}4\\5\\6\end{bmatrix}\]

Solution:

\[\begin{align*} \langle u,v \rangle &= \sum_{i} u_i v_i \\ &= 1\cdot4 + 2\cdot5 + 3\cdot6 \\ &= 4 + 10 + 18 \\ &= 32 \end{align*}\]


Problem 5: Inner Product with Random Matrices

Given matrices: \[A=\begin{bmatrix}3&2\\1&4\end{bmatrix}\] \[B=\begin{bmatrix}5&7\\8&6\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} A_{ij} B_{ij} \\ &= 3\cdot5 + 2\cdot7 + \\ &\quad 1\cdot8 + 4\cdot6 \\ &= 15 + 14 + 8 + 24 \\ &= 61 \end{align*}\]


Problem 6: Inner Product of 2x3 and 3x2 Matrices

Given matrices: \[A=\begin{bmatrix}1&2&3\\4&5&6\end{bmatrix}\] \[B=\begin{bmatrix}7&8\\9&10\\11&12\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} A_{ij} B_{ij} \\ &= 1\cdot7 + 2\cdot8 + 3\cdot11 + \\ &\quad 4\cdot9 + 5\cdot10 + 6\cdot12 \\ &= 7 + 16 + 33 + 36 + 50 + 72 \\ &= 214 \end{align*}\]


Problem 7: Inner Product with Transpose Operation

Given matrices: \[A=\begin{bmatrix}2&3\\4&5\end{bmatrix}\] \[B=\begin{bmatrix}6&7\\8&9\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} A_{ij} B_{ij} \\ &= 2\cdot6 + 3\cdot7 + \\ &\quad 4\cdot8 + 5\cdot9 \\ &= 12 + 21 + 32 + 45 \\ &= 110 \end{align*}\]


Problem 8: Inner Product of Symmetric Matrices

Given symmetric matrices: \[A=\begin{bmatrix}1&2\\2&3\end{bmatrix}\] \[B=\begin{bmatrix}4&5\\5&6\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} A_{ij} B_{ij} \\ &= 1\cdot4 + 2\cdot5 + \\ &\quad 2\cdot5 + 3\cdot6 \\ &= 4 + 10 + 10 + 18 \\ &= 42 \end{align*}\]


Problem 9: Inner Product with Complex Matrices

Given matrices: \[A=\begin{bmatrix}1+i&2-i\\3+i&4-i\end{bmatrix}\] \[B=\begin{bmatrix}5-i&6+i\\7-i&8+i\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} \text{Re}(A_{ij} \overline{B_{ij}}) \\ &= (1+i)\cdot(5+i) + (2-i)\cdot(6-i) + \\ &\quad (3+i)\cdot(7+i) + (4-i)\cdot(8+i) \\ &= (5+i+5i-i^2) + (12-i-6i+i^2) + \\ &\quad (21+i+7i-i^2) + (32+i-8i-i^2) \\ &= 5+5 + 12 - 6 + 21 + 32 - 2 \\ &= 62 \end{align*}\]


Problem 10: Inner Product of 4x4 Matrices

Given matrices: \[A=\begin{bmatrix}1&2&3&4\\5&6&7&8\\9&10&11&12\\13&14&15&16\end{bmatrix}\] \[B=\begin{bmatrix}16&15&14&13\\12&11&10&9\\8&7&6&5\\4&3&2&1\end{bmatrix}\]

Solution:

\[\begin{align*} \langle A,B \rangle &= \sum_{i,j} A_{ij} B_{ij} \\ &= 1\cdot16 + 2\cdot15 + 3\cdot14 + 4\cdot13 + \\ &\quad 5\cdot12 + 6\cdot11 + 7\cdot10 + 8\cdot9 + \\ &\quad 9\cdot8 + 10\cdot7 + 11\cdot6 + 12\cdot5 + \\ &\quad 13\cdot4 + 14\cdot3 + 15\cdot2 + 16\cdot1 \\ &= 16 + 30 + 42 + 52 + 60 + 66 + 70 + 72 + \\ &\quad 72 + 70 + 66 + 60 + 52 + 42 + 30 + 16 \\ &= 696 \end{align*}\]


Now let’s look into the computational part of inner product.

  1. Compute Inner Product from Scratch (without Libraries)

Here’s how you can compute the inner product from the scratch:

# Define matrices A and B
A = [[1, 2, 3], [4, 5, 6]]
B = [[7, 8, 9], [10, 11, 12]]

# Function to compute inner product
def inner_product(A, B):
    # Get the number of rows and columns
    num_rows = len(A)
    num_cols = len(A[0])
    
    # Initialize the result
    result = 0
    
    # Compute the inner product
    for i in range(num_rows):
        for j in range(num_cols):
            result += A[i][j] * B[i][j]
    
    return result

# Compute inner product
inner_product_result = inner_product(A, B)

# Display result
print("Inner Product (From Scratch):")
print(inner_product_result)
Inner Product (From Scratch):
217
  1. Compute Inner Product Using NumPy

Here’s how to compute the inner product using Numpy:

import numpy as np
# Define matrices A and B
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8, 9], [10, 11, 12]])
# calculating innerproduct
inner_product = (A*B).sum() # calculate element-wise product, then column sum

print("Inner Product (Using numpy):")
print(inner_product)
Inner Product (Using numpy):
217

The same operation can be done using SymPy functions as follows.

import sympy as sp
import numpy as np  
# Define matrices A and B
A = sp.Matrix([[1, 2, 3], [4, 5, 6]])
B = sp.Matrix([[7, 8, 9], [10, 11, 12]])

# Compute element-wise product
elementwise_product = A.multiply_elementwise(B)

# Calculate sum of each column
inner_product_sympy = np.sum(elementwise_product)

# Display result
print("Inner Product (Using SymPy):")
print(inner_product_sympy)
Inner Product (Using SymPy):
217

A vector dot product (in Physics) can be calculated using SymPy .dot() function as shown below.

Let \(A=\begin{pmatrix}1&2&3\end{pmatrix}\) and \(B=\begin{pmatrix}4&5&6\end{pmatrix}\), then the dot product, \(A\cdot B\) is computed as:

import sympy as sp
A=sp.Matrix([1,2,3])
B=sp.Matrix([4,5,6])
display(A.dot(B)) # calculate fot product of A and B

\(\displaystyle 32\)

A word of caution

In SymPy , sp.Matrix([1,2,3]) create a column vector. But np.array([1,2,3]) creates a row vector. So be careful while applying matrix/ dot product operations on these objects.

The same dot product using numpy object can be done as follows:

import numpy as np
A=np.array([1,2,3])
B=np.array([4,5,6])
display(A.dot(B.T))# dot() stands for dot product B.T represents the transpose of B
32

Practical Applications

Application 1: Signal Processing

In signal processing, the inner product can be used to measure the similarity between two signals. Here the most popular measure of similarity is the cosine similarity. This measure is defined as:

\[\cos \theta=\dfrac{A\cdot B}{||A|| ||B||}\]

Now consider two digital signals are given. It’s cosine similarity measure can be calculated with a simulated data as shown below.

import numpy as np

# Simulated large signals (1D array) using NumPy
signal1 = np.sin(np.random.rand(1000))
signal2 = np.cos(np.random.rand(1000))

# Compute inner product
inner_product_signal = np.dot(signal1, signal2)
#cosine_sim=np.dot(signal1,signal2)/(np.linalg.norm(signal1)*np.linalg.norm(signal2))
# Display result
cosine_sim=inner_product_signal/(np.sqrt(np.dot(signal1,signal1))*np.sqrt(np.dot(signal2,signal2)))
print("Inner Product (Using numpy):")
print(inner_product_signal)
print("Similarity of signals:")
print(cosine_sim)
Inner Product (Using numpy):
387.8012139700734
Similarity of signals:
0.8704891845819591

Application 2: Machine Learning - Feature Similarity

In machine learning, the inner product is used to calculate the similarity between feature vectors.

import numpy as np

# Simulated feature vectors (2D array) using NumPy
features1 = np.random.rand(100, 10)
features2 = np.random.rand(100, 10)

# Compute inner product for each feature vector
inner_products = np.einsum('ij,ij->i', features1, features2) # use Einstien's sum

# Display results
print("Inner Products of Feature Vectors:")
display(inner_products)
Inner Products of Feature Vectors:
array([2.61102338, 1.4155999 , 3.25000824, 2.41655936, 1.93157566,
       2.19477819, 2.12679698, 3.06553894, 3.74554647, 4.10990809,
       1.98131966, 2.84450384, 2.2243122 , 2.58409242, 2.2537591 ,
       3.17451944, 3.86092924, 1.90632106, 2.47826298, 2.01204619,
       4.31510112, 2.43671069, 3.58885605, 1.45523711, 2.52128395,
       3.16348974, 3.1847619 , 2.66598594, 2.63693273, 1.48815669,
       1.89682381, 2.00764338, 1.82954687, 2.19787784, 1.82103506,
       2.70682361, 4.14842985, 2.80672499, 3.02116063, 2.19998533,
       3.10230074, 2.60100442, 3.65672942, 1.13288363, 2.23361306,
       2.61852013, 2.43124543, 2.3762841 , 2.99733032, 3.58111295,
       2.60701381, 1.8212961 , 2.6341459 , 1.81749036, 1.98306523,
       2.40974101, 3.89306424, 1.56214172, 2.32677802, 1.26413856,
       3.09345585, 3.70796521, 1.69645084, 2.10669788, 1.72513793,
       1.89343271, 3.40714794, 4.20275318, 2.22815523, 1.60218578,
       0.9772914 , 1.42237772, 1.69563219, 3.28445863, 2.58328416,
       2.46313616, 2.43927919, 2.79230138, 2.60259376, 2.7408041 ,
       2.8551307 , 2.37075423, 3.20620134, 1.91039534, 2.81772451,
       2.82104929, 2.22069818, 3.0052798 , 2.58203716, 1.10423113,
       2.73876471, 1.8166736 , 1.929993  , 1.46680923, 1.79427732,
       3.83369994, 2.37803706, 2.32127536, 1.9612972 , 2.74966051])

Application 3: Covariance Matrix in Statistics

The inner product can be used to compute covariance matrices for statistical data analysis. If \(X\) is a given distribution and \(x=X-\bar{X}\). Then the covariance of \(X\) can be calculated as \(cov(X)=\dfrac{1}{n-1}(x\cdot x^T)\) 2. The python code a simulated data is shown below.

import sympy as sp
import numpy as np

# Simulated large dataset (2D array) using NumPy
data = np.random.rand(100, 10)

# Compute the mean of each column
mean = np.mean(data, axis=0)

# Center the data
centered_data = data - mean

# Compute the covariance matrix using matrix product operation
cov_matrix = (centered_data.T @ centered_data) / (centered_data.shape[0] - 1)
cov_matrix_sympy = sp.Matrix(cov_matrix)

# Display results
print("Covariance Matrix:")
display(cov_matrix_sympy)
Covariance Matrix:

\(\displaystyle \left[\begin{matrix}0.0826206001773978 & 0.0190949660858603 & -0.00657601964915786 & 0.00978377417600984 & -0.00396647392861959 & 0.000802133525943591 & -0.0134785277081127 & -0.00256796402919267 & -0.00185751912007374 & 0.00188002717127027\\0.0190949660858603 & 0.0863758537538684 & 0.00465307022298206 & 0.000634206362120265 & -0.00640940750721016 & 0.0065986789463667 & 0.00406109518532904 & -0.0179234376071376 & -0.00178442832330686 & 0.00369185471729624\\-0.00657601964915786 & 0.00465307022298206 & 0.101676710745905 & 0.00129246783357073 & 0.0115252858291662 & -0.01255532067245 & -0.0192138774615725 & 0.00485976103158238 & -0.00322152721653495 & 0.00922517245585824\\0.00978377417600984 & 0.000634206362120265 & 0.00129246783357073 & 0.085566173675165 & -0.0158016024825646 & -0.00404732296208295 & 0.00589174594818515 & -0.00714704177669192 & -0.000932476370712307 & 0.0105522552620407\\-0.00396647392861959 & -0.00640940750721016 & 0.0115252858291662 & -0.0158016024825646 & 0.0924268262154616 & 0.000905376524254485 & -0.00719787223360057 & 0.0120360350180386 & -0.00659030600004849 & -0.0098195132374878\\0.000802133525943591 & 0.0065986789463667 & -0.01255532067245 & -0.00404732296208295 & 0.000905376524254485 & 0.0880898700386659 & 0.0147620989308444 & 0.00727920696730826 & 0.00475276495566338 & 0.00114623860884323\\-0.0134785277081127 & 0.00406109518532904 & -0.0192138774615725 & 0.00589174594818515 & -0.00719787223360057 & 0.0147620989308444 & 0.0825882799176838 & 0.0112752622146153 & 0.00113978535836565 & 0.00439117016673781\\-0.00256796402919267 & -0.0179234376071376 & 0.00485976103158238 & -0.00714704177669192 & 0.0120360350180386 & 0.00727920696730826 & 0.0112752622146153 & 0.0968847538289747 & 0.0102075882160858 & -0.0101947768632437\\-0.00185751912007374 & -0.00178442832330686 & -0.00322152721653495 & -0.000932476370712307 & -0.00659030600004849 & 0.00475276495566338 & 0.00113978535836565 & 0.0102075882160858 & 0.0860985487815505 & -0.00919047492964528\\0.00188002717127027 & 0.00369185471729624 & 0.00922517245585824 & 0.0105522552620407 & -0.0098195132374878 & 0.00114623860884323 & 0.00439117016673781 & -0.0101947768632437 & -0.00919047492964528 & 0.0779287838710158\end{matrix}\right]\)

These examples demonstrate the use of inner product and dot product in various applications.

2.2.2.5 Outer Product

The outer product of two vectors results in a matrix, and it is a way to combine these vectors into a higher-dimensional representation.

Definition (Outer Product)

For two vectors \(\mathbf{u}\) and \(\mathbf{v}\) of dimensions \(m\) and \(n\) respectively, the outer product \(\mathbf{u} \otimes \mathbf{v}\) is an \(m \times n\) matrix defined as:

\[(\mathbf{u} \otimes \mathbf{v})_{ij} = u_i \cdot v_j\]

where \(\cdot\) denotes the outer product operation. In matrix notation, for two column vectors \(u,v\), \[u\otimes v=uv^T\]

Properties
  1. Linearity: \[(\mathbf{u} + \mathbf{w}) \otimes \mathbf{v} = (\mathbf{u} \otimes \mathbf{v}) + (\mathbf{w} \otimes \mathbf{v})\]

  2. Distributivity: \[\mathbf{u} \otimes (\mathbf{v} + \mathbf{w}) = (\mathbf{u} \otimes \mathbf{v}) + (\mathbf{u} \otimes \mathbf{w})\]

  3. Associativity: \[(\mathbf{u} \otimes \mathbf{v}) \otimes \mathbf{w} = \mathbf{u} \otimes (\mathbf{v} \otimes \mathbf{w})\]

Some simple examples of outer product is given below.

Example 1: Basic Outer Product

Given vectors:

\[\mathbf{u} = \begin{pmatrix}1 \\2\end{pmatrix}, \quad\mathbf{v} = \begin{pmatrix}3 \\4 \\5\end{pmatrix}\]

The outer product \(\mathbf{u} \otimes \mathbf{v}\) is:

\[\mathbf{u} \otimes \mathbf{v} = \begin{pmatrix}1 \cdot 3 & 1 \cdot 4 & 1 \cdot 5 \\2 \cdot 3 & 2 \cdot 4 & 2 \cdot 5\end{pmatrix} = \begin{pmatrix}3 & 4 & 5 \\6 & 8 & 10\end{pmatrix}\]

Example 2: Outer Product with Larger Vectors

Given vectors: \[\mathbf{u} = \begin{pmatrix}1 \\2 \\3\end{pmatrix}, \quad\mathbf{v} = \begin{pmatrix}4 \\5\end{pmatrix}\]

The outer product \(\mathbf{u} \otimes \mathbf{v}\) is:

\[\mathbf{u} \otimes \mathbf{v} = \begin{pmatrix}1 \cdot 4 & 1 \cdot 5 \\2 \cdot 4 & 2 \cdot 5 \\3 \cdot 4 & 3 \cdot 5\end{pmatrix} = \begin{pmatrix}4 & 5 \\8 & 10 \\12 & 15\end{pmatrix}\]

2.2.2.6 Practice Problems

Find the outer product of A and B where A and B are given as follows:

Problem 1:

Find the outer product of: \[A=\begin{bmatrix}1\\2\end{bmatrix}\] \[B=\begin{bmatrix}3&4\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1\\2\end{bmatrix} \otimes \begin{bmatrix}3&4\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot 3 & 1 \cdot 4 \\ 2 \cdot 3 & 2 \cdot 4 \end{bmatrix} \\ &= \begin{bmatrix} 3 & 4 \\ 6 & 8 \end{bmatrix} \end{align*}\]


Problem 2:

Find the outer product of: \[A=\begin{bmatrix}1\\2\\3\end{bmatrix}\] \[B=\begin{bmatrix}4&5&6\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1\\2\\3\end{bmatrix} \otimes \begin{bmatrix}4&5&6\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot 4 & 1 \cdot 5 & 1 \cdot 6 \\ 2 \cdot 4 & 2 \cdot 5 & 2 \cdot 6 \\ 3 \cdot 4 & 3 \cdot 5 & 3 \cdot 6 \end{bmatrix} \\ &= \begin{bmatrix} 4 & 5 & 6 \\ 8 & 10 & 12 \\ 12 & 15 & 18 \end{bmatrix} \end{align*}\]


Problem 3:

Find the outer product of: \[A=\begin{bmatrix}1&2\end{bmatrix}\] \[B=\begin{bmatrix}3\\4\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1&2\end{bmatrix} \otimes \begin{bmatrix}3\\4\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot 3 & 1 \cdot 4 \\ 2 \cdot 3 & 2 \cdot 4 \end{bmatrix} \\ &= \begin{bmatrix} 3 & 4 \\ 6 & 8 \end{bmatrix} \end{align*}\]


Problem 4:

Find the outer product of: \[A=\begin{bmatrix}0\\1\end{bmatrix}\] \[B=\begin{bmatrix}1&-1\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}0\\1\end{bmatrix} \otimes \begin{bmatrix}1&-1\end{bmatrix} \\ &= \begin{bmatrix} 0 \cdot 1 & 0 \cdot -1 \\ 1 \cdot 1 & 1 \cdot -1 \end{bmatrix} \\ &= \begin{bmatrix} 0 & 0 \\ 1 & -1 \end{bmatrix} \end{align*}\]


Problem 5:

Find the outer product of: \[A=\begin{bmatrix}2\\3\end{bmatrix}\] \[B=\begin{bmatrix}5&-2\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}2\\3\end{bmatrix} \otimes \begin{bmatrix}5&-2\end{bmatrix} \\ &= \begin{bmatrix} 2 \cdot 5 & 2 \cdot -2 \\ 3 \cdot 5 & 3 \cdot -2 \end{bmatrix} \\ &= \begin{bmatrix} 10 & -4 \\ 15 & -6 \end{bmatrix} \end{align*}\]


Problem 6:

Find the outer product of: \[A=\begin{bmatrix}1\\0\\1\end{bmatrix}\] \[B=\begin{bmatrix}2&-1&0\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1\\0\\1\end{bmatrix} \otimes \begin{bmatrix}2&-1&0\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot 2 & 1 \cdot -1 & 1 \cdot 0 \\ 0 \cdot 2 & 0 \cdot -1 & 0 \cdot 0 \\ 1 \cdot 2 & 1 \cdot -1 & 1 \cdot 0 \end{bmatrix} \\ &= \begin{bmatrix} 2 & -1 & 0 \\ 0 & 0 & 0 \\ 2 & -1 & 0 \end{bmatrix} \end{align*}\]


Problem 7:

Find the outer product of: \[A=\begin{bmatrix}1\\-1\end{bmatrix}\] \[B=\begin{bmatrix}2&0\\3&-1\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &=\begin{bmatrix}1\\-1\end{bmatrix}\otimes \begin{bmatrix}2&0\\3&-1\end{bmatrix}\\ &= \begin{bmatrix} 2 & 3&0&-1 \\ -2&-3&0&1 \end{bmatrix} \end{align*}\]


Problem 8:

Find the outer product of: \[A=\begin{bmatrix}3\\4\end{bmatrix}\] \[B=\begin{bmatrix}1&-2&3\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}3\\4\end{bmatrix} \otimes \begin{bmatrix}1&-2&3\end{bmatrix} \\ &= \begin{bmatrix} 3 \cdot 1 & 3 \cdot -2 & 3 \cdot 3 \\ 4 \cdot 1 & 4 \cdot -2 & 4 \cdot 3 \end{bmatrix} \\ &= \begin{bmatrix} 3 & -6 & 9 \\ 4 & -8 & 12 \end{bmatrix} \end{align*}\]


Problem 9:

Find the outer product of: \[A=\begin{bmatrix}2\\3\\-1\end{bmatrix}\] \[B=\begin{bmatrix}4&-2\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}2\\3\\-1\end{bmatrix} \otimes \begin{bmatrix}4&-2\end{bmatrix} \\ &= \begin{bmatrix} 2 \cdot 4 & 2 \cdot -2 \\ 3 \cdot 4 & 3 \cdot -2 \\ -1 \cdot 4 & -1 \cdot -2 \end{bmatrix} \\ &= \begin{bmatrix} 8 & -4 \\ 12 & -6 \\ -4 & 2 \end{bmatrix} \end{align*}\]


Problem 10:

Find the outer product of: \[A=\begin{bmatrix}0\\5\end{bmatrix}\] \[B=\begin{bmatrix}3&1\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}0\\5\end{bmatrix} \otimes \begin{bmatrix}3&1\end{bmatrix} \\ &= \begin{bmatrix} 0 \cdot 3 & 0 \cdot 1 \\ 5 \cdot 3 & 5 \cdot 1 \end{bmatrix} \\ &= \begin{bmatrix} 0 & 0 \\ 15 & 5 \end{bmatrix} \end{align*}\]


1. Compute Outer Product of Vectors from Scratch (without Libraries)

Here’s how you can compute the outer product manually:

# Define vectors u and v
u = [1, 2]
v = [3, 4, 5]

# Function to compute outer product
def outer_product(u, v):
    # Initialize the result
    result = [[a * b for b in v] for a in u]
    return result

# Compute outer product
outer_product_result = outer_product(u, v)

# Display result
print("Outer Product of Vectors (From Scratch):")
for row in outer_product_result:
    print(row)
Outer Product of Vectors (From Scratch):
[3, 4, 5]
[6, 8, 10]

2. Compute Outer Product of Vectors Using SymPy

Here’s how to compute the outer product using SymPy:

import sympy as sp

# Define vectors u and v
u = sp.Matrix([1, 2])
v = sp.Matrix([3, 4, 5])

# Compute outer product using SymPy
outer_product_sympy = u * v.T

# Display result
print("Outer Product of Vectors (Using SymPy):")
display(outer_product_sympy)
Outer Product of Vectors (Using SymPy):

\(\displaystyle \left[\begin{matrix}3 & 4 & 5\\6 & 8 & 10\end{matrix}\right]\)

Outer Product of Matrices

The outer product of two matrices extends the concept from vectors to higher-dimensional tensors. For two matrices \(A\) and \(B\), the outer product results in a higher-dimensional tensor and is generally expressed as block matrices.

Definition (Outer Product of Matrices)

For two matrices \(A\) of dimension \(m \times p\) and \(B\) of dimension \(q \times n\), the outer product \(A \otimes B\) results in a tensor of dimension \(m \times q \times p \times n\). The entries of the tensor are given by:

\[(A \otimes B)_{ijkl} = A_{ij} \cdot B_{kl}\]

where \(\cdot\) denotes the outer product operation.

Properties
  1. Linearity: \[(A + C) \otimes B = (A \otimes B) + (C \otimes B)\]

  2. Distributivity: \[A \otimes (B + D) = (A \otimes B) + (A \otimes D)\]

  3. Associativity:

\[(A \otimes B) \otimes C = A \otimes (B \otimes C)\]

Here are some simple examples to demonstrate the mathematical procedure to find outer product of matrices.

Example 1: Basic Outer Product of Matrices

Given matrices: \[A = \begin{pmatrix}1 & 2 \\3 & 4\end{pmatrix}, \quad B = \begin{pmatrix}5 \\6\end{pmatrix}\]

The outer product \(A \otimes B\) is:

\[A \otimes B = \begin{pmatrix}1 \cdot 5 & 1 \cdot 6 \\2 \cdot 5 & 2 \cdot 6 \\3 \cdot 5 & 3 \cdot 6 \\4 \cdot 5 & 4 \cdot 6\end{pmatrix} = \begin{pmatrix}5 & 6 \\10 & 12 \\15 & 18 \\20 & 24\end{pmatrix}\]

Example 2: Outer Product with Larger Matrices

Given matrices:

\[A = \begin{pmatrix}1 & 2 & 3 \\4 & 5 & 6\end{pmatrix}, \quad B = \begin{pmatrix}7 \\8\end{pmatrix}\]

The outer product \(A \otimes B\) is:

\[A \otimes B = \begin{pmatrix}1 \cdot 7 & 1 \cdot 8 \\2 \cdot 7 & 2 \cdot 8 \\3 \cdot 7 & 3 \cdot 8 \\4 \cdot 7 & 4 \cdot 8 \\5 \cdot 7 & 5 \cdot 8 \\6 \cdot 7 & 6 \cdot 8\end{pmatrix} = \begin{pmatrix}7 & 8 \\14 & 16 \\21 & 24 \\28 & 32 \\35 & 40 \\42 & 48\end{pmatrix}\]

Example 3: Compute the outer product of the following vectors \(\mathbf{u} = [0, 1, 2]\) and \(\mathbf{v} = [2, 3, 4]\).

To find the outer product, we calculate each element \((i, j)\) as the product of the \((i)\)-th element of \(\mathbf{u}\) and the \((j)\)-th element of \(\mathbf{v}\). Mathematically:

\[\mathbf{u} \otimes \mathbf{v} = \begin{bmatrix}0 \cdot 2 & 0 \cdot 3 & 0 \cdot 4 \\1 \cdot 2 & 1 \cdot 3 & 1 \cdot 4 \\2 \cdot 2 & 2 \cdot 3 & 2 \cdot 4\end{bmatrix}= \begin{bmatrix}0 & 0 & 0 \\2 & 3 & 4 \\4 & 6 & 8\end{bmatrix}\]

1. Compute Outer Product of Matrices from Scratch (without Libraries)

Here’s how you can compute the outer product manually:

# Define matrices A and B
A = [[1, 2], [3, 4]]
B = [[5], [6]]

# Function to compute outer product
def outer_product_matrices(A, B):
    m = len(A)
    p = len(A[0])
    q = len(B)
    n = len(B[0])
    result = [[0] * (n * p) for _ in range(m * q)]

    for i in range(m):
        for j in range(p):
            for k in range(q):
                for l in range(n):
                    result[i*q + k][j*n + l] = A[i][j] * B[k][l]

    return result

# Compute outer product
outer_product_result_matrices = outer_product_matrices(A, B)

# Display result
print("Outer Product of Matrices (From Scratch):")
for row in outer_product_result_matrices:
    print(row)
Outer Product of Matrices (From Scratch):
[5, 10]
[6, 12]
[15, 20]
[18, 24]

Here is the Python code to compute the outer product of these vectors using the NumPy function .outer():

import numpy as np

# Define vectors
u = np.array([[1,2],[3,4]])
v = np.array([[5],[4]])

# Compute outer product
outer_product = np.outer(u, v)

print("Outer Product of u and v:")
display(outer_product)
Outer Product of u and v:
array([[ 5,  4],
       [10,  8],
       [15, 12],
       [20, 16]])

Example 3: Real-world Application in Recommendation Systems

In recommendation systems, the outer product can represent user-item interactions. A simple context is here. Let the user preferences of items is given as \(u=[4, 3, 5]\) and the item scores is given by \(v=[2, 5, 4]\). Now the recommendation score can be calculated as the outer product of these two vectors. Calculation of this score is shown below. The outer product \(\mathbf{u} \otimes \mathbf{v}\) is calculated as follows:

\[\mathbf{u} \otimes \mathbf{v} = \begin{bmatrix}4 \cdot 2 & 4 \cdot 5 & 4 \cdot 4 \\3 \cdot 2 & 3 \cdot 5 & 3 \cdot 4 \\5 \cdot 2 & 5 \cdot 5 & 5 \cdot 4\end{bmatrix}= \begin{bmatrix}8 & 20 & 16 \\6 & 15 & 12 \\10 & 25 & 20\end{bmatrix}\]

The python code for this task is given below.

import numpy as np
import matplotlib.pyplot as plt

# Define the user and product ratings vectors
user_ratings = np.array([4, 3, 5])
product_ratings = np.array([2, 5, 4])

# Compute the outer product
predicted_ratings = np.outer(user_ratings, product_ratings)

# Print the predicted ratings matrix
print("Predicted Ratings Matrix:")
display(predicted_ratings)

# Plot the result
plt.imshow(predicted_ratings, cmap='coolwarm', interpolation='nearest')
plt.colorbar()
plt.title('Predicted Ratings Matrix (Recommendation System)')
plt.xlabel('Product Ratings')
plt.ylabel('User Ratings')
plt.xticks(ticks=np.arange(len(product_ratings)), labels=product_ratings)
plt.yticks(ticks=np.arange(len(user_ratings)), labels=user_ratings)
plt.show()
Predicted Ratings Matrix:
array([[ 8, 20, 16],
       [ 6, 15, 12],
       [10, 25, 20]])

Additional Properties & Definitions
  1. Definition and Properties

    Given two vectors:

    • \(\mathbf{u} \in \mathbb{R}^m\)
    • \(\mathbf{v} \in \mathbb{R}^n\)

    The outer product \(\mathbf{u} \otimes \mathbf{v}\) results in an \(m \times n\) matrix where each element \((i, j)\) of the matrix is calculated as: \[(\mathbf{u} \otimes \mathbf{v})_{ij} = u_i \cdot v_j\]

  2. Non-Symmetry

    The outer product is generally not symmetric. For vectors \(\mathbf{u}\) and \(\mathbf{v}\), the matrix \(\mathbf{u} \otimes \mathbf{v}\) is not necessarily equal to \(\mathbf{v} \otimes \mathbf{u}\): \[\mathbf{u} \otimes \mathbf{v} \neq \mathbf{v} \otimes \mathbf{u}\]

  3. Rank of the Outer Product

    The rank of the outer product matrix \(\mathbf{u} \otimes \mathbf{v}\) is always 1, provided neither \(\mathbf{u}\) nor \(\mathbf{v}\) is a zero vector. This is because the matrix can be expressed as a single rank-1 matrix.

  4. Distributive Property

    The outer product is distributive over vector addition. For vectors \(\mathbf{u}_1, \mathbf{u}_2 \in \mathbb{R}^m\) and \(\mathbf{v} \in \mathbb{R}^n\): \[(\mathbf{u}_1 + \mathbf{u}_2) \otimes \mathbf{v} = (\mathbf{u}_1 \otimes \mathbf{v}) + (\mathbf{u}_2 \otimes \mathbf{v})\]

  5. Associativity with Scalar Multiplication

    The outer product is associative with scalar multiplication. For a scalar \(\alpha\) and vectors \(\mathbf{u} \in \mathbb{R}^m\) and \(\mathbf{v} \in \mathbb{R}^n\): \[\alpha (\mathbf{u} \otimes \mathbf{v}) = (\alpha \mathbf{u}) \otimes \mathbf{v} = \mathbf{u} \otimes (\alpha \mathbf{v})\]

  6. Matrix Trace

    The trace of the outer product of two vectors is given by: \[\text{tr}(\mathbf{u} \otimes \mathbf{v}) = (\mathbf{u}^T \mathbf{v})= (\mathbf{v}^T \mathbf{u})\] Here, \(\text{tr}\) denotes the trace of a matrix, which is the sum of its diagonal elements.

  7. Matrix Norm

    The Frobenius norm of the outer product matrix can be expressed in terms of the norms of the original vectors: \[\| \mathbf{u} \otimes \mathbf{v} \|_F = \| \mathbf{u} \|_2 \cdot \| \mathbf{v} \|_2\] where \(\| \cdot \|_2\) denotes the Euclidean norm.

Example Calculation in Python

Here’s how to compute and visualize the outer product properties using Python:

import numpy as np
import matplotlib.pyplot as plt

# Define vectors
u = np.array([1, 2, 3])
v = np.array([4, 5])

# Compute outer product
outer_product = np.outer(u, v)

# Display results
print("Outer Product Matrix:")
print(outer_product)

# Compute and display rank
rank = np.linalg.matrix_rank(outer_product)
print(f"Rank of Outer Product Matrix: {rank}")

# Compute Frobenius norm
frobenius_norm = np.linalg.norm(outer_product, 'fro')
print(f"Frobenius Norm: {frobenius_norm}")

# Plot the result
plt.imshow(outer_product, cmap='viridis', interpolation='nearest')
plt.colorbar()
plt.title('Outer Product Matrix')
plt.xlabel('Vector v')
plt.ylabel('Vector u')
plt.xticks(ticks=np.arange(len(v)), labels=v)
plt.yticks(ticks=np.arange(len(u)), labels=u)
plt.show()
Outer Product Matrix:
[[ 4  5]
 [ 8 10]
 [12 15]]
Rank of Outer Product Matrix: 1
Frobenius Norm: 23.958297101421877
Figure 2.10: Demonstration of Outer Product and its Properties

2.2.2.7 Kronecker Product

In mathematics, the Kronecker product, sometimes denoted by \(\otimes\), is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product (which is denoted by the same symbol) from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.

Note

If \(A\) is an \(m \times n\) matrix and \(B\) is a \(p \times q\) matrix, then the Kronecker product \(A\otimes B\) is the \(pm \times qn\) block matrix defined as: Each \(a_{ij}\) of \(A\) is replaced by the matrix \(a_{ij}B\). Symbolically this will result in a block matrix defined by:

\[A\otimes B=A \otimes B = \begin{bmatrix}a_{11}B & a_{12}B & \cdots & a_{1n}B \\a_{21}B & a_{22}B & \cdots & a_{2n}B \\\vdots & \vdots & \ddots & vdots \\a_{m1}B & a_{m2}B & \cdots & a_{mn}B\end{bmatrix}\]

Properties of the Kronecker Product
  1. Associativity

    The Kronecker product is associative. For matrices \(A \in \mathbb{R}^{m \times n}\), \(B \in \mathbb{R}^{p \times q}\), and \(C \in \mathbb{R}^{r \times s}\): \[(A \otimes B) \otimes C = A \otimes (B \otimes C)\]

  2. Distributivity Over Addition

    The Kronecker product distributes over matrix addition. For matrices \(A \in \mathbb{R}^{m \times n}\), \(B \in \mathbb{R}^{p \times q}\), and \(C \in \mathbb{R}^{p \times q}\): \[A \otimes (B + C) = (A \otimes B) + (A \otimes C)\]

  3. Mixed Product Property

    The Kronecker product satisfies the mixed product property with the matrix product. For matrices \(A \in \mathbb{R}^{m \times n}\), \(B \in \mathbb{R}^{p \times q}\), \(C \in \mathbb{R}^{r \times s}\), and \(D \in \mathbb{R}^{r \times s}\): \[(A \otimes B) (C \otimes D) = (A C) \otimes (B D)\]

  4. Transpose

    The transpose of the Kronecker product is given by: \[(A \otimes B)^T = A^T \otimes B^T\]

  5. Norm

    The Frobenius norm of the Kronecker product can be computed as: \[\| A \otimes B \|_F = \| A \|_F \cdot \| B \|_F\] where \(\| \cdot \|_F\) denotes the Frobenius norm.


Frobenius Norm

The Frobenius norm, also known as the Euclidean norm for matrices, is a measure of a matrix’s magnitude. It is defined as the square root of the sum of the absolute squares of its elements. Mathematically, for a matrix \(A\) with elements \(a_{ij}\), the Frobenius norm is given by:

\[\|A\|_F = \sqrt{\sum_{i,j} |a_{ij}|^2}\]

Example 1: Calculation of Frobenius Norm

Consider the matrix \(A\):

\[A = \begin{bmatrix}1 & 2 \\3 & 4\end{bmatrix}\]

To compute the Frobenius norm:

\[\|A\|_F = \sqrt{1^2 + 2^2 + 3^2 + 4^2}= \sqrt{1 + 4 + 9 + 16}= \sqrt{30}\approx 5.48\]

Example 2: Frobenius Norm of a Sparse Matrix

Consider the sparse matrix \(B\):

\[B = \begin{bmatrix}0 & 0 & 0 \\0 & 5 & 0 \\0 & 0 & 0\end{bmatrix}\]

To compute the Frobenius norm:

\[\|B\|_F = \sqrt{0^2 + 0^2 + 0^2 + 5^2 + 0^2 + 0^2}= \sqrt{25}= 5\]

Example 3: Frobenius Norm in a Large Matrix

Consider the matrix \(C\) of size $3 $:

\[C = \begin{bmatrix}1 & 2 & 3 \\4 & 5 & 6 \\7 & 8 & 9\end{bmatrix}\]

To compute the Frobenius norm:

\[\begin{align*} \|C\|_F &= \sqrt{1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2 + 7^2 + 8^2 + 9^2}\\ &= \sqrt{1 + 4 + 9 + 16 + 25 + 36 + 49 + 64 + 81}\\ &= \sqrt{285}\\ &\approx 16.88 \end{align*}\]

Applications of the Frobenius Norm

  • Application 1: Image Compression: In image processing, the Frobenius norm can measure the difference between the original and compressed images, indicating how well the compression has preserved the original image quality.

  • Application 2: Matrix Factorization: In numerical analysis, Frobenius norm is used to evaluate the error in matrix approximations, such as in Singular Value Decomposition (SVD). A lower Frobenius norm of the error indicates a better approximation.

  • Application 3: Error Measurement in Numerical Solutions: In solving systems of linear equations, the Frobenius norm can be used to measure the error between the true solution and the computed solution, providing insight into the accuracy of numerical methods.

The linalg sub module of NumPy library can be used to calculate various norms. Basically norm is the generalized form of Euclidean distance.

import numpy as np

# Example 1: Simple Matrix
A = np.array([[1, 2], [3, 4]])
frobenius_norm_A = np.linalg.norm(A, 'fro')
print(f"Frobenius Norm of A: {frobenius_norm_A:.2f}")

# Example 2: Sparse Matrix
B = np.array([[0, 0, 0], [0, 5, 0], [0, 0, 0]])
frobenius_norm_B = np.linalg.norm(B, 'fro')
print(f"Frobenius Norm of B: {frobenius_norm_B:.2f}")

# Example 3: Large Matrix
C = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
frobenius_norm_C = np.linalg.norm(C, 'fro')
print(f"Frobenius Norm of C: {frobenius_norm_C:.2f}")
Frobenius Norm of A: 5.48
Frobenius Norm of B: 5.00
Frobenius Norm of C: 16.88

Frobenius norm of Kronecker product

Let us consider two matrices,

\[A = \begin{bmatrix}1 & 2 \\3 & 4\end{bmatrix}\]

and

\[B = \begin{bmatrix}0 & 5 \\6 & 7\end{bmatrix}\]

The Kronecker product \(C = A \otimes B\) is:

\[C = \begin{bmatrix}1 \cdot B & 2 \cdot B \\3 \cdot B & 4 \cdot B\end{bmatrix}= \begin{bmatrix}\begin{bmatrix}0 & 5 \\6 & 7\end{bmatrix} & \begin{bmatrix}0 \cdot 2 & 5 \cdot 2 \\6 \cdot 2 & 7 \cdot 2\end{bmatrix} \\\begin{bmatrix}0 \cdot 3 & 5 \cdot 3 \\6 \cdot 3 & 7 \cdot 3\end{bmatrix} & \begin{bmatrix}0 \cdot 4 & 5 \cdot 4 \\6 \cdot 4 & 7 \cdot 4\end{bmatrix}\end{bmatrix}\]

This expands to:

\[C = \begin{bmatrix}0 & 5 & 0 & 10 \\6 & 7 & 12 & 14 \\0 & 15 & 0 & 20 \\18 & 21 & 24 & 28\end{bmatrix}\]

Computing the Frobenius Norm

To compute the Frobenius norm of \(C\):

\[\|C\|_F = \sqrt{\sum_{i=1}^{4} \sum_{j=1}^{4} |c_{ij}|^2}\]

\[\|C\|_F = \sqrt{0^2 + 5^2 + 0^2 + 10^2 + 6^2 + 7^2 + 12^2 + 14^2 + 0^2 + 15^2 + 0^2 + 20^2 + 18^2 + 21^2 + 24^2 + 28^2}\]

\[\|C\|_F = \sqrt{0 + 25 + 0 + 100 + 36 + 49 + 144 + 196 + 0 + 225 + 0 + 400 + 324 + 441 + 576 + 784}\]

\[\|C\|_F = \sqrt{2896}\] \[\|C\|_F \approx 53.87\]


2.2.2.8 Practice Problems

Find the Kronecker product of A and B where A and B are given as follows:

Problem 1:

Find the Kronecker product of: \[A=\begin{bmatrix}1&2\\3&4\end{bmatrix}\] \[B=\begin{bmatrix}0&1\\1&0\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1&2\\3&4\end{bmatrix} \otimes \begin{bmatrix}0&1\\1&0\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot \begin{bmatrix}0&1\\1&0\end{bmatrix} & 2 \cdot \begin{bmatrix}0&1\\1&0\end{bmatrix} \\ 3 \cdot \begin{bmatrix}0&1\\1&0\end{bmatrix} & 4 \cdot \begin{bmatrix}0&1\\1&0\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 0 & 1 & 0 & 2 \\ 1 & 0 & 2& 0\\ 0 & 3 & 0 & 4 \\ 3 & 0 & 4 & 0 \end{bmatrix} \end{align*}\]


Problem 2:

Find the Kronecker product of: \[A=\begin{bmatrix}1&0\\0&1\end{bmatrix}\] \[B=\begin{bmatrix}2&3\\4&5\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1&0\\0&1\end{bmatrix} \otimes \begin{bmatrix}2&3\\4&5\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot \begin{bmatrix}2&3\\4&5\end{bmatrix} & 0 \cdot \begin{bmatrix}2&3\\4&5\end{bmatrix} \\ 0 \cdot \begin{bmatrix}2&3\\4&5\end{bmatrix} & 1 \cdot \begin{bmatrix}2&3\\4&5\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 2 & 3 & 0 & 0 \\ 4 & 5 & 0 & 0 \\ 0 & 0 & 2 & 3 \\ 0 & 0 & 4 & 5 \end{bmatrix} \end{align*}\]


Problem 3:

Find the Kronecker product of: \[A=\begin{bmatrix}1&2\end{bmatrix}\] \[B=\begin{bmatrix}3\\4\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1&2\end{bmatrix} \otimes \begin{bmatrix}3\\4\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot \begin{bmatrix}3\\4\end{bmatrix} & 2 \cdot \begin{bmatrix}3\\4\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 3 & 6 \\ 4 & 8 \end{bmatrix} \end{align*}\]


Problem 4:

Find the Kronecker product of: \[A=\begin{bmatrix}0&1\end{bmatrix}\] \[B=\begin{bmatrix}1&-1\\2&0\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}0&1\end{bmatrix} \otimes \begin{bmatrix}1&-1\\2&0\end{bmatrix} \\ &= \begin{bmatrix} 0 \cdot \begin{bmatrix}1&-1\\2&0\end{bmatrix} & 1 \cdot \begin{bmatrix}1&-1\\2&0\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 0 & 0 &1&-1\\ 0 & 0&2&0 \\ \end{bmatrix} \end{align*}\]


Problem 5:

Find the Kronecker product of: \[A=\begin{bmatrix}2\\3\end{bmatrix}\] \[B=\begin{bmatrix}4&-2\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}2\\3\end{bmatrix} \otimes \begin{bmatrix}4&-2\end{bmatrix} \\ &= \begin{bmatrix} 2 \cdot \begin{bmatrix}4&-2\end{bmatrix} \\ 3 \cdot \begin{bmatrix}4&-2\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 8 & -4 \\ 12 & -6 \end{bmatrix} \end{align*}\]


Problem 6:

Find the Kronecker product of: \[A=\begin{bmatrix}1&-1\\0&2\end{bmatrix}\] \[B=\begin{bmatrix}0&1\\1&0\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1&-1\\0&2\end{bmatrix} \otimes \begin{bmatrix}0&1\\1&0\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot \begin{bmatrix}0&1\\1&0\end{bmatrix} & -1 \cdot \begin{bmatrix}0&1\\1&0\end{bmatrix} \\ 0 \cdot \begin{bmatrix}0&1\\1&0\end{bmatrix} & 2 \cdot \begin{bmatrix}0&1\\1&0\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 0 & 1 & 0 & -1 \\ 1 & 0 & -1 & 0 \\ 0 & 0 & 0 & 2 \\ 0 & 0 & 2 & 0 \end{bmatrix} \end{align*}\]


Problem 7:

Find the Kronecker product of: \[A=\begin{bmatrix}2\end{bmatrix}\] \[B=\begin{bmatrix}3&4\\5&6\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}2\end{bmatrix} \otimes \begin{bmatrix}3&4\\5&6\end{bmatrix} \\ &= 2 \cdot \begin{bmatrix}3&4\\5&6\end{bmatrix} \\ &= \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix} \end{align*}\]


Problem 8:

Find the Kronecker product of: \[A=\begin{bmatrix}0&1\end{bmatrix}\] \[B=\begin{bmatrix}1&0\\0&1\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}0&1\end{bmatrix} \otimes \begin{bmatrix}1&0\\0&1\end{bmatrix} \\ &= \begin{bmatrix} 0 \cdot \begin{bmatrix}1&0\\0&1\end{bmatrix} & 1 \cdot \begin{bmatrix}1&0\\0&1\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix} \end{align*}\]


Problem 9:

Find the Kronecker product of: \[A=\begin{bmatrix}1&0\\0&1\end{bmatrix}\] \[B=\begin{bmatrix}1&1\\1&1\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}1&0\\0&1\end{bmatrix} \otimes \begin{bmatrix}1&1\\1&1\end{bmatrix} \\ &= \begin{bmatrix} 1 \cdot \begin{bmatrix}1&1\\1&1\end{bmatrix} & 0 \cdot \begin{bmatrix}1&1\\1&1\end{bmatrix} \\ 0 \cdot \begin{bmatrix}1&1\\1&1\end{bmatrix} & 1 \cdot \begin{bmatrix}1&1\\1&1\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix} \end{align*}\]


Problem 10:

Find the Kronecker product of: \[A=\begin{bmatrix}2&-1\\3&4\end{bmatrix}\] \[B=\begin{bmatrix}0&5\\-2&3\end{bmatrix}\]

Solution:

\[\begin{align*} A \otimes B &= \begin{bmatrix}2&-1\\3&4\end{bmatrix} \otimes \begin{bmatrix}0&5\\-2&3\end{bmatrix} \\ &= \begin{bmatrix} 2 \cdot \begin{bmatrix}0&5\\-2&3\end{bmatrix} & -1 \cdot \begin{bmatrix}0&5\\-2&3\end{bmatrix} \\ 3 \cdot \begin{bmatrix}0&5\\-2&3\end{bmatrix} & 4 \cdot \begin{bmatrix}0&5\\-2&3\end{bmatrix} \end{bmatrix} \\ &= \begin{bmatrix} 0 & 10 & 0 & -5 \\ -4 & 6 & 2 & -3 \\ 0 & 15 & 0 & 20 \\ -6 & 9 & -8 & 12 \end{bmatrix} \end{align*}\]


2.2.2.9 Connection Between Outer Product and Kronecker Product

  1. Conceptual Connection:

    • The outer product is a special case of the Kronecker product. Specifically, if \(\mathbf{A}\) is a column vector and \(\mathbf{B}\) is a row vector, then \(\mathbf{A}\) is a \(m \times 1\) matrix and \(\mathbf{B}\) is a \(1 \times n\) matrix. The Kronecker product of these two matrices will yield the same result as the outer product of these vectors.

    • For matrices \(\mathbf{A}\) and \(\mathbf{B}\), the Kronecker product involves taking the outer product of each element of \(\mathbf{A}\) with the entire matrix \(\mathbf{B}\).

  2. Mathematical Formulation:

    • Let \(\mathbf{A} = \begin{bmatrix}a_{11} & a_{12}\\ a_{21} & a_{22}\end{bmatrix}\) and \(\mathbf{B} = \begin{bmatrix}b_{11} & b_{12}\\ b_{21} & b_{22}\end{bmatrix}\). Then:

    \[\mathbf{A} \otimes \mathbf{B} = \begin{bmatrix} a_{11} \mathbf{B} & a_{12} \mathbf{B} \\ a_{21} \mathbf{B} & a_{22} \mathbf{B} \end{bmatrix}\]

    • If \(\mathbf{A} = \mathbf{u} \mathbf{v}^T\) where \(\mathbf{u}\) is a column vector and \(\mathbf{v}^T\) is a row vector, then the Kronecker product of \(\mathbf{u}\) and \(\mathbf{v}^T\) yields the same result as the outer product \(\mathbf{u} \otimes \mathbf{v}\).
Note

Summary

  • The outer product is a specific case of the Kronecker product where one of the matrices is a vector (either row or column).
  • The Kronecker product generalizes the outer product to matrices and is more versatile in applications involving tensor products and higher-dimensional constructs.

2.2.2.10 Matrix Multiplication as Kronecker Product

Given matrices \(\mathbf{A}\) and \(\mathbf{B}\), where: - \(\mathbf{A}\) is an \(m \times n\) matrix - \(\mathbf{B}\) is an \(n \times p\) matrix

The product \(\mathbf{C} = \mathbf{A} \mathbf{B}\) can be expressed using Kronecker products as:

\[\mathbf{C} = \sum_{k=1}^n (\mathbf{A}_{:,k} \otimes \mathbf{B}_{k,:})\]

where: - \(\mathbf{A}_{:,k}\) denotes the \(k\)-th column of matrix \(\mathbf{A}\) - \(\mathbf{B}_{k,:}\) denotes the \(k\)-th row of matrix \(\mathbf{B}\)

Example:

Let:

\[\mathbf{A} = \begin{bmatrix}1 & 2 \\3 & 4\end{bmatrix}\]

and:

\[\mathbf{B} = \begin{bmatrix}0 & 1 \\1 & 0\end{bmatrix}\]

To find \(\mathbf{C} = \mathbf{A} \mathbf{B}\) using Kronecker products:

  1. Compute the Kronecker Product of Columns of \(\mathbf{A}\) and Rows of \(\mathbf{B}\):

    • For column \(\mathbf{A}_{:,1} = \begin{bmatrix} 1 \\ 3 \end{bmatrix}\) and row \(\mathbf{B}_{1,:} = \begin{bmatrix} 0 & 1 \end{bmatrix}\): \[\mathbf{A}_{:,1} \otimes \mathbf{B}_{1,:} = \begin{bmatrix} 0 & 1 \\ 0 & 3 \end{bmatrix}\]

    • For column \(\mathbf{A}_{:,2} = \begin{bmatrix} 2 \\ 4 \end{bmatrix}\) and row \(\mathbf{B}_{2,:} = \begin{bmatrix} 1 & 0 \end{bmatrix}\): \[\mathbf{A}_{:,2} \otimes \mathbf{B}_{2,:} = \begin{bmatrix}2 & 0 \\ 4 & 0\end{bmatrix}\]

  2. Sum the Kronecker Products:

    \[\mathbf{C} = \begin{bmatrix}0 & 1 \\ 0 & 3\end{bmatrix} +\begin{bmatrix} 2 & 0 \\ 4 & 0 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 4 & 3\end{bmatrix}\]


In the previous block we have discussed the Frobenius norm and its applications. Now came back to the discussions on the Kronecker product. The Kronecker product is particularly useful in scenarios where interactions between different types of data need to be modeled comprehensively. In recommendation systems, it allows us to integrate user preferences with item relationships to improve recommendation accuracy.

In addition to recommendation systems, Kronecker products are used in various fields such as:

  • Signal Processing: For modeling multi-dimensional signals.
  • Machine Learning: In building features for complex models.
  • Communication Systems: For modeling network interactions.

By understanding the Kronecker product and its applications, we can extend it to solve complex problems and enhance systems across different domains. To understand the practical use of Kronecker product in a Machine Learning scenario let us consider the following problem statement and its solution.

Problem statement

In the realm of recommendation systems, predicting user preferences for various product categories based on past interactions is a common challenge. Suppose we have data on user preferences for different products and categories. We can use this data to recommend the best products for each user by employing mathematical tools such as the Kronecker product. The User Preference and Category relationships are given in Table 2.4 and Table 2.5 .

Table 2.4: User Preference
User/Item Electronics Clothing Books
User 1 5 3 4
User 2 2 4 5
User 3 3 4 4
Table 2.5: Category Relationships
Category/Feature Feature 1 Feature 2 Feature 3
Electronics 1 0 0
Clothing 0 1 1
Books 0 1 1

Predict user preferences for different product categories using the Kronecker product matrix.

Solution Procedure

  1. Compute the Kronecker Product: Calculate the Kronecker product of matrices \(U\) and \(C\) to obtain matrix \(K\).

    To model the problem, we use the Kronecker product of the user preference matrix \(U\) and the category relationships matrix \(C\). This product allows us to predict the user’s rating for each category by combining their preferences with the category features.

Formulating Matrices

User Preference Matrix (U): - Dimension: \(3\times 3\) (3 users, 3 items) - from the User preference data, we can create the User Preference Matrix as follows:

\[U = \begin{pmatrix}5 & 3 & 4 \\2 & 4 & 5 \\3 & 4 & 4 \end{pmatrix}\]

Category Relationships Matrix (C): - Dimension: \(3 \times 3\) (3 categories) - from the Category Relationships data, we can create the Category Relationship Matrix as follows:

\[C = \begin{pmatrix}1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 1\end{pmatrix}\]

Kronecker Product Calculation

The Kronecker product \(K\) of \(U\) and \(C\) is calculated as follows:

  1. Matrix Dimensions:
  • \(U\) is \(3 \times 3\) (3 users, 3 items).
  • \(C\) is \(3 \times 3\) (3 categories, 3 features).
  1. Calculate Kronecker Product:
  • For each element \(u_{ij}\) in \(U\), multiply by the entire matrix \(C\).

The Kronecker product \(K\) is computed as:

\[K = U \otimes C\]

Explicitly, the Kronecker product \(K\) is:

\[K = \begin{pmatrix}5 \cdot C & 3 \cdot C & 4 \cdot C \\ 2 \cdot C & 4 \cdot C & 5 \cdot C \\ 3 \cdot C & 4 \cdot C & 4 \cdot C\end{pmatrix}\]

As an example the blocks in first row are:

\[5 \cdot C = \begin{pmatrix} 5 & 0 & 0 \\ 0 & 5 & 5 \\ 0 & 5 & 5 \end{pmatrix}, \quad 3 \cdot C = \begin{pmatrix} 3 & 0 & 0 \\ 0 & 3 & 3 \\ 0 & 3 & 3 \end{pmatrix}, \quad 4 \cdot C = \begin{pmatrix} 4 & 0 & 0 \\ 0 & 4 & 4 \\ 0 & 4 & 4 \end{pmatrix}\]

Combining these blocks:

\[K = \begin{pmatrix} 5 & 0 & 0 & 3 & 0 & 0 & 4 & 0 & 0\\ 0 & 5 & 5 & 0 & 3 & 3 & 0 & 4 & 4\\ 0 & 5 & 5 & 0 & 3 & 3 & 0 & 4 & 4\\ 2 & 0 & 0 & 4 & 0 & 0 & 5 & 0 & 0\\ 0 & 2 & 2 & 0 & 4 & 4 & 0 & 5 & 5\\ 0 & 2 & 2 & 0 & 4 & 4 & 0 & 5 & 5\\ 3 & 0 & 0 & 4 & 0 & 0 & 4 & 0 & 0\\ 0 & 3 & 3 & 0 & 4 & 4 & 0 & 4 & 4\\ 0 & 3 & 3 & 0 & 4 & 4 & 0 & 4 & 4\end{pmatrix}\]

  1. Interpret the Kronecker Product Matrix: The resulting matrix \(K\) represents all possible combinations of user preferences and category features.

  2. Predict Ratings: For each user, use matrix \(K\) to predict the rating for each category by summing up the values in the corresponding rows.

  3. Generate Recommendations: Identify the top categories with the highest predicted ratings for each user.

The python code to solve this problem computationally is given below.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Define the matrices
U = np.array([[5, 3, 4],
              [2, 4, 5],
              [3, 4, 4]])

C = np.array([[1, 0, 0],
              [0, 1, 1],
              [0, 1, 1]])

# Compute the Kronecker product
K = np.kron(U, C)

# Create a DataFrame to visualize the Kronecker product matrix
df_K = pd.DataFrame(K, 
                    columns=['Electronics_F1', 'Electronics_F2', 'Electronics_F3', 
                             'Clothing_F1', 'Clothing_F2', 'Clothing_F3', 
                             'Books_F1', 'Books_F2', 'Books_F3'],
                    index=['User 1 Electronics', 'User 1 Clothing', 'User 1 Books', 
                           'User 2 Electronics', 'User 2 Clothing', 'User 2 Books', 
                           'User 3 Electronics', 'User 3 Clothing', 'User 3 Books'])

# Print the Kronecker product matrix
print("Kronecker Product Matrix (K):\n", df_K)

# Predict ratings and create recommendations
def recommend(user_index, top_n=3):
    """ Recommend top_n categories for a given user based on Kronecker product matrix. """
    user_ratings = K[user_index * len(C):(user_index + 1) * len(C), :]
    predicted_ratings = np.sum(user_ratings, axis=0)
    recommendations = np.argsort(predicted_ratings)[::-1][:top_n]
    return recommendations

# Recommendations for User 1
user_index = 0  # User 1
top_n = 3
recommendations = recommend(user_index, top_n)

print(f"\nTop {top_n} recommendations for User {user_index + 1}:")
for rec in recommendations:
    print(df_K.columns[rec])
Kronecker Product Matrix (K):
                     Electronics_F1  Electronics_F2  Electronics_F3  \
User 1 Electronics               5               0               0   
User 1 Clothing                  0               5               5   
User 1 Books                     0               5               5   
User 2 Electronics               2               0               0   
User 2 Clothing                  0               2               2   
User 2 Books                     0               2               2   
User 3 Electronics               3               0               0   
User 3 Clothing                  0               3               3   
User 3 Books                     0               3               3   

                    Clothing_F1  Clothing_F2  Clothing_F3  Books_F1  Books_F2  \
User 1 Electronics            3            0            0         4         0   
User 1 Clothing               0            3            3         0         4   
User 1 Books                  0            3            3         0         4   
User 2 Electronics            4            0            0         5         0   
User 2 Clothing               0            4            4         0         5   
User 2 Books                  0            4            4         0         5   
User 3 Electronics            4            0            0         4         0   
User 3 Clothing               0            4            4         0         4   
User 3 Books                  0            4            4         0         4   

                    Books_F3  
User 1 Electronics         0  
User 1 Clothing            4  
User 1 Books               4  
User 2 Electronics         0  
User 2 Clothing            5  
User 2 Books               5  
User 3 Electronics         0  
User 3 Clothing            4  
User 3 Books               4  

Top 3 recommendations for User 1:
Electronics_F3
Electronics_F2
Books_F3

A simple visualization of this recomendation system is shown in Fig 2.11.

# Visualization
def plot_recommendations(user_index):
    """ Plot the predicted ratings for each category for a given user. """
    user_ratings = K[user_index * len(C):(user_index + 1) * len(C), :]
    predicted_ratings = np.sum(user_ratings, axis=0)
    categories = df_K.columns
    plt.figure(figsize=(6, 5))
    plt.bar(categories, predicted_ratings)
    plt.xlabel('Categories')
    plt.ylabel('Predicted Ratings')
    plt.title(f'Predicted Ratings for User {user_index + 1}')
    plt.xticks(rotation=45)
    plt.show()

# Plot recommendations for User 1
plot_recommendations(user_index)
Figure 2.11: EDA for the Recommendation System

This micro project illustrate one of the popular use of Kronecker product on ML application.

2.2.3 Matrix Measures of Practical Importance

Matrix measures, such as rank and determinant, play crucial roles in linear algebra. While both rank and determinant provide valuable insights into the properties of a matrix, they serve different purposes. Understanding their roles and applications is essential for solving complex problems in computer science, engineering, and applied mathematics.

2.2.3.1 Determinant

Determinant of a \(2\times 2\) matrix \(A=\begin{pmatrix}a&b\\c&d\end{pmatrix}\) is defined as \(|A|=ad-bc\). Determinant of higher order square matrices can be found using the Laplace method or Sarrus method.

The determinant of a matrix provides information about the matrix’s invertibility and scaling factor for volume transformation. Specifically:

  1. Invertibility: A matrix is invertible if and only if its determinant is non-zero.

  2. Volume Scaling: The absolute value of the determinant gives the scaling factor by which the matrix transforms volume.

  3. Parallelism: If the determinant of a matrix composed of vectors is zero, the vectors are linearly dependent, meaning they are parallel or redundant.

  4. Redundancy: A zero determinant indicates that the vectors span a space of lower dimension than the number of vectors, showing redundancy.

Least Possible Values of Determinant
  1. Least Positive Determinant: For a \(1\times 1\) matrix, the smallest non-zero determinant is any positive value, typically \(\epsilon\), where \(\epsilon\) is a small positive number.
  2. Least Non-Zero Determinant: For higher-dimensional matrices, the smallest non-zero determinant is a non-zero value that represents the smallest area or volume spanned by the matrix’s rows or columns. For example a \(2\times 2\) matrix with determinant \(\epsilon\) could be: \[B=\begin{pmatrix}\epsilon&0\\ 0&\epsilon\end{pmatrix}\] Here, \(\epsilon\) is a small positive number, indicating a very small but non-zero area.

Now let’s look into the most important matrix measure for advanced application in Linear Algebra.

As we know the matrix is basically a representation tool that make things abstract- remove unnecessary details. Then the matrix itself can be represented in many ways. This is the real story telling with this most promising mathematical structure. Consider a context of collecting feedback about a product in three aspects- cost, quality and practicality. For simplicity in calculation, we consider responses from 3 customers only. The data is shown in Table 2.6.

Table 2.6: User rating of a consumer product
User Cost Quality Practicality
User-1 1 4 5
User-2 3 2 5
User-3 2 1 3

It’s perfect and nice looking. But both mathematics and a computer can’t handle this table as it is. So we create an abstract representation of this data- the rating matrix. Using the traditional approach, let’s represent this rating data as: \[A=\begin{bmatrix}1&4&5\\3&2&5\\2&1&3\end{bmatrix}\]

Now both the column names and row indices were removed and the data is transformed into the abstract form. This representation has both advantages and disadvantages. Be positive! So we are focused only in the advantages.

Just consider the product. Its sales fully based on its features. So the product sales perspective will be represented in terms of the features- cost, quality and practicality. These features are columns of our rating matrix. Definitly people will have different rating for these features. Keeping all these in mind let’s introduce the concept of linear combination. This leads to a new matrix product as shown below.

\[\begin{align*} Ax&=\begin{bmatrix} 1&4&5\\ 3&2&5\\ 2&1&3 \end{bmatrix}x\\ &=\begin{bmatrix} 1&4&5\\ 3&2&5\\ 2&1&3 \end{bmatrix}\cdot\begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}\\ &=\begin{bmatrix}1\\3\\2\end{bmatrix}x_1+\begin{bmatrix}4\\2\\1\end{bmatrix}x_2+\begin{bmatrix}5\\5\\3\end{bmatrix}x_3 \end{align*}\]

As the number of users increases, the product sales perspective become more informative. In short the span of the features define the feature space of the product. In real cases, a manufacture wants to know what are the features really inflence the customers. This new matrix product will help the manufactures to identify that features!

So we are going to define this new matrix product as the feature space, that will provide more insights to this context as:

\[A=CR\]

Where \(C\) is the column space and \(R\) is the row reduced Echelon form of \(A\). But the product is not the usual scalar projection, Instead the weight of linear combination of elements in the column space.

Let’s formally illustrate this in our example. From the first observation itself, it is clear that last column is just the sum of first and second columns (That is in our context the feature ‘practicality’ is just depends on ‘cost’ and ‘quality’. meaningful?). So only first columns are independent and so spans the column space.

\[C=\begin{bmatrix}1&4\\3&2\\2&1\end{bmatrix}\]

Now look into the matrix \(R\). Applying elementary row tansformations, \(A\) will transformed into:

\[R=\begin{bmatrix}1&0&1\\0&1&1\\0&0&0\end{bmatrix}\]

Hence we can form a decomposition for the given rating matrix, \(A\) as: \[\begin{align*} A&=CR\\ &=\begin{bmatrix}1&4\\3&2\\2&1\end{bmatrix}\begin{bmatrix}1&0&1\\0&1&1\\\mbox{}&&\end{bmatrix} \end{align*}\]

This decomposition says that there are only two independent features (columns) and the third feature (column) is the sum of first two features (columns).

Interpretation of the \(R\) matrix

Each column in the \(R\) matrix represents the weights for linear combination of vectors in the column space to get that column in \(A\). In this example, third column of \(R\) is \(\begin{bmatrix}1\\1\end{bmatrix}\). This means that third column of \(A\) will be \(1\times C_1+1\times C_2\) of the column space, \(C\)!

This first matrix decompostion donate a new type of matrix product (outer product) and a new measure- the number of independent columns and number of independent rows. This count is called the rank of the matrix \(A\). In the case of features, if the rank of the column space is less than the number of features then definitly a less number of feature set will perfectly represent the data. This will help us to reduce the dimension of the dataset and there by reducing computational complexities in data analysis and machine Learning jobs.

In the above discussion, we consider only the columns of \(A\). Now we will mention the row space. It is the set of all linearly independent rows of \(A\). For any matrix \(A\), both the row space and column space are of same rank. This correspondance is a helpful result in many practical applications.

Now we consider a stable equation, \(Ax=0\). With the usual notation of dot product, it implies that \(x\) is orthogonal to \(A\). Set of all those independent vectors which are orthogonal to \(A\) constitute a new space of interest. It is called the null space of \(A\). If \(A\) represents a linear transformation, then the null space will be populated by those non-zero vectors which are nullified by the transformation \(A\). As a summary of this discussion, the row space and null space of a matrix \(A\) creates an orthogonal system. Considering the relationship between \(A\) and \(A^T\), it is clear that row space of \(A\) is same as the column space of \(A^T\) and vice verse are. So we can restate the orthogonality as: ‘the null space of \(A\) is orthogonal to the column space of \(A^T\)’ and ‘the null space of \(A^T\) is orthogonal to the column space of \(A\)’. Mathematically this property can be represents as follows.

Note

\[\begin{align*} \mathcal{N}(A)&\perp \mathcal{C}(A^T)\\ \mathcal{N}(A^T)&\perp \mathcal{C}(A) \end{align*}\]

In the given example, solving \(Ax=0\) we get \(x=\begin{bmatrix}1&1&-1\end{bmatrix}^T\).

So the rank of \(\mathcal{N}(A)=1\). Already we have rank of \(A=2\). This leads to an interesting result:

\[\text{Rank}(A)+\text{Rank}(\mathcal{N}(A))=3\]

This observation can be framed as a theorem.

2.2.4 Rank Nullity Theorem

The rank-nullity theorem is a fundamental theorem in linear algebra that is important for understanding the connections between mathematical operations in engineering, physics, and computer science. It states that the sum of the rank and nullity of a matrix equals the number of columns in the matrix. The rank is the maximum number of linearly independent columns, and the nullity is the dimension of the nullspace.

Theorem 2.1 (Rank Nullitty Theorem) The Rank-Nullity Theorem states that for any \(m \times n\) matrix \(A\), the following relationship holds:

\[ \text{Rank}(A) + \text{Nullity}(A) = n \]

where: - Rank of \(A\) is the dimension of the column space of \(A\), which is also equal to the dimension of the row space of \(A\). - Nullity of \(A\) is the dimension of the null space of \(A\), which is the solution space to the homogeneous system \(A \mathbf{x} = \mathbf{0}\).

Steps to Formulate for Matrix \(A\)

  1. Find the Rank of \(A\): The rank of a matrix is the maximum number of linearly independent columns (or rows). It can be determined by transforming \(A\) into its row echelon form or reduced row echelon form (RREF).

  2. Find the Nullity of \(A\): The nullity is the dimension of the solution space of \(A \mathbf{x} = \mathbf{0}\). This can be found by solving the homogeneous system and counting the number of free variables.

  3. Apply the Rank-Nullity Theorem: Use the rank-nullity theorem to verify the relationship.


Example 1: Calculate the rank and nullity of \(A=\begin{bmatrix} 1 & 4 & 5 \\ 3 & 2 & 5 \\ 2 & 1 & 3 \end{bmatrix}\) and verify the rank nullity theorem.

  1. Row Echelon Form:

    Perform Gaussian elimination on \(A\):

    \[A = \begin{bmatrix} 1 & 4 & 5 \\ 3 & 2 & 5 \\ 2 & 1 & 3 \end{bmatrix}\]

    Perform row operations to get it to row echelon form:

    • Subtract 3 times row 1 from row 2: \[\begin{bmatrix} 1 & 4 & 5 \\ 0 & -10 & -10 \\ 2 & 1 & 3 \end{bmatrix}\]

    • Subtract 2 times row 1 from row 3: \[\begin{bmatrix} 1 & 4 & 5 \\ 0 & -10 & -10 \\ 0 & -7 & -7 \end{bmatrix}\]

    • Add \(\frac{7}{10}\) times row 2 to row 3: \[\begin{bmatrix} 1 & 4 & 5 \\ 0 & -10 & -10 \\ 0 & 0 & 0 \end{bmatrix}\]

    The matrix is now in row echelon form.

    Rank is the number of non-zero rows, which is 2.

  2. Find the Nullity: The matrix \(A\) has 3 columns. The number of free variables in the solution of \(A \mathbf{x} = \mathbf{0}\) is \(3 - \text{Rank}\).

    So, \[\text{Nullity}(A) = 3 - 2 = 1\]

  3. Apply the Rank-Nullity Theorem: \[\text{Rank}(A) + \text{Nullity}(A) = 2 + 1 = 3\]

    This matches the number of columns of \(A\), confirming the theorem.

2.2.5 Fundamental Subspaces

In section (note-ortho?), we have seen that for any matrix \(A\), there is two pairs of inter-related orthogonal spaces. This leads to the concept of Fundamental sup spaces.

Matrices are not just arrays of numbers; they can represent linear transformations too. A linear transformation maps vectors from one vector space to another while preserving vector addition and scalar multiplication. The matrix \(A\) can be viewed as a representation of a linear transformation \(T\) from \(\mathbb{R}^n\) to \(\mathbb{R}^m\) where:

\[T(\mathbf{x}) = A \mathbf{x}\]

In this context:

  • The column space of \(A\) represents the range of \(T\), which is the set of all possible outputs.
  • The null space of \(A\) represents the kernel of \(T\), which is the set of vectors that are mapped to the zero vector.

The Four Fundamental Subspaces

Understanding the four fundamental subspaces helps in analyzing the properties of a linear transformation. These subspaces are:

Definition 2.1 (Four Fundamental Subspaces) Let \(T:\mathbb{R^n}\longrightarrow \mathbb{R^m}\) be a linear transformation and \(A\) represents the matrix of transformation. The four fundamental subspaces are defined as:

  1. Column Space (Range): The set of all possible outputs of the transformation. For matrix \(A\), this is the span of its columns. It represents the image of \(\mathbb{R}^n\) under \(T\).

  2. Null Space (Kernel): The set of all vectors that are mapped to the zero vector by the transformation. For matrix \(A\), this is the solution space of \(A \mathbf{x} = \mathbf{0}\).

  3. Row Space: The span of the rows of \(A\). This space is crucial because it helps in understanding the rank of \(A\). The dimension of the row space is equal to the rank of \(A\), which represents the maximum number of linearly independent rows.

  4. Left Null Space: The set of all vectors \(\mathbf{y}\) such that \(A^T \mathbf{y} = \mathbf{0}\). It provides insight into the orthogonal complement of the row space.

This idea is depicted as a ‘Big picture of the four sub spaces of a matrix’ in the Strang’s text book on Linear algebra for every one (Strang 2020). This ‘Big Picture’ is shown in Fig- 2.12.

A video session from Strang’s session is here:

2.2.5.1 Practice Problems

Problem 1: Express the vector \((1,-2,5)\) as a linear combination of the vectors \((1,1,1)\), \((1,2,3)\) and \((2,-1,1)\).

Problem 2: Show that the feature vector \((2,-5,3)\) is not linearly associated with the features \((1,-3,2)\), \((2,-4,-1)\) and \((1,-5,7)\).

Problem 3: Show that the feature vectors \((1,1,1)\), \((1,2,3)\) and \((2,-1,1)\) are non-redundant.

Problem 4: Prove that the features \((1,-1,1)\), \((0,1,2)\) and \((3,0,-1)\) form basis for the feature space.

Problem 5: Check whether the vectors \((1,2,1)\), \((2,1,4)\) and \((4,5,6)\) form a basis for \(\mathbb{R}^3\).

Problem 6: Find the four fundamental subspaces of the feature space created by \((1,2,1)\), \((2,1,4)\) and \((4,5,6)\).

Problem 7: Find the four fundamental subspaces and its dimensions of the matrix \(\begin{bmatrix}1&2&4\\2&1&5\\1&4&6\end{bmatrix}\).

Problem 8: Express \(A=\begin{bmatrix}1&2&-1\\3&1&-1\\2&-1&0\end{bmatrix}\) as the Kronecker product of the column space and the row space in the form \(A=C\otimes R\).

Problem 9: Find the four fundamental subspaces of \(A=\begin{bmatrix} 1&2&0&2&5\\-2&-5&1&-1&-8\\0&-3&3&4&1\\3&6&0&-7&2\end{bmatrix}\).

Problem 10: Find the four fundamental subspaces of \(A=\begin{bmatrix}-1&2&-1&5&6\\4&-4&-4&-12&-8\\2&0&-6&-2&4\\-3&1&7&-2&12\end{bmatrix}\).

Problem 11: Express \(A=\begin{bmatrix}2&3&-1&-1\\1&-1&-2&-4\\3&1&3&-2\\6&3&0&-7\end{bmatrix}\) in \(A=C\otimes R\), where \(C\) is the column space and \(R\) is the row space of \(A\).

Problem 12: Express \(A=\begin{bmatrix}0&1&-3&-1\\1&0&1&1\\3&1&0&2\\1&1&-2&0\end{bmatrix}\) in \(A=C\otimes R\), where \(C\) is the column space and \(R\) is the row space of \(A\).

Problem 13: Show that the feature vectors \((2,3,0)\), \((1,2,0)\) and \((8,13,0)\) are redundant and hence find the relationship between them.

Problem 14: Show that the feature vectors \((1,2,1)\), \((4,1,2)\), \((-3,8,1)\) and \((6,5,4)\) are redundant and hence find the relationship between them.

Problem 15: Show that the feature vectors \((1,2,-1,0)\), \((1,3,1,2)\), \((4,2,1,0)\) and \((6,1,0,1)\) are redundant and hence find the relationship between them.

Important

Three Parts of the Fundamental theorem The fundamental theorem of linear algebra relates all four of the fundamental subspaces in a number of different ways. There are main parts to the theorem:

Part 1:(Rank nullity theorem) The column and row spaces of an \(m\times n\) matrix \(A\) both have dimension \(r\), the rank of the matrix. The nullspace has dimension \(n−r\), and the left nullspace has dimension \(m−r\).

Part 2:(Orthogonal subspaces) The nullspace and row space are orthogonal. The left nullspace and the column space are also orthogonal.

Part 3:(Matrix decomposition) The final part of the fundamental theorem of linear algebra constructs an orthonormal basis, and demonstrates a singular value decomposition: any matrix \(M\) can be written in the form \(M=U\Sigma V^T\) , where \(U_{m\times m}\) and \(V_{n\times n}\) are unitary matrices, \(\Sigma_{m\times n}\) matrix with nonnegative values on the diagonal.

This part of the fundamental theorem allows one to immediately find a basis of the subspace in question. This can be summarized in the following table.

Subspace Subspace of Symbol Dimension Basis
Column space \(\mathbb{R}^m\) \(\operatorname{im}(A)\) \(r\) First \(r\) columns of \(U\)
Nullspace (kernel) \(\mathbb{R}^n\) \(\ker(A)\) \(n - r\) Last \(n - r\) columns of \(V\)
Row space \(\mathbb{R}^n\) \(\operatorname{im}(A^T)\) \(r\) First \(r\) columns of \(V\)
Left nullspace (kernel) \(\mathbb{R}^m\) \(\ker(A^T)\) \(m - r\) Last \(m - r\) columns of \(U\)

2.2.5.2 Computational methods to find all the four fundamental subspaces of a matrix

There are different approaches to find the four fundamental subspaces of a matrix using Python. Simplest method is just convert our mathematical procedure into Python functions and call them to find respective spaces. This method is illustrated below.

# importing numpy library for numerical computation
import numpy as np
# define the function create the row-reduced Echelon form of given matrix
def row_echelon_form(A):
    """Convert matrix A to its row echelon form."""
    A = A.astype(float)
    rows, cols = A.shape
    for i in range(min(rows, cols)):
        # Pivot: find the maximum element in the current column
        max_row = np.argmax(np.abs(A[i:, i])) + i
        if A[max_row, i] == 0:
            continue  # Skip if the column is zero
        # Swap the current row with the max_row
        A[[i, max_row]] = A[[max_row, i]]
        # Eliminate entries below the pivot
        for j in range(i + 1, rows):
            factor = A[j, i] / A[i, i]
            A[j, i:] -= factor * A[i, i:]
    return A

# define function to generate null space from the row-reduced echelon form
def null_space_of_matrix(A, rtol=1e-5):
    """Compute the null space of a matrix A using row reduction."""
    A_reduced = row_echelon_form(A)
    rows, cols = A_reduced.shape
    # Identify pivot columns
    pivots = []
    for i in range(rows):
        for j in range(cols):
            if np.abs(A_reduced[i, j]) > rtol:
                pivots.append(j)
                break
    free_vars = set(range(cols)) - set(pivots)
    
    null_space = []
    for free_var in free_vars:
        null_vector = np.zeros(cols)
        null_vector[free_var] = 1
        for pivot, row in zip(pivots, A_reduced[:len(pivots)]):
            null_vector[pivot] = -row[free_var]
        null_space.append(null_vector)
    
    return np.array(null_space).T

# define the function to generate the row-space of A

def row_space_of_matrix(A):
    """Compute the row space of a matrix A using row reduction."""
    A_reduced = row_echelon_form(A)
    # The non-zero rows of the reduced matrix form the row space
    non_zero_rows = A_reduced[~np.all(A_reduced == 0, axis=1)]
    return non_zero_rows

# define the function to generate the column space of A

def column_space_of_matrix(A):
    """Compute the column space of a matrix A using row reduction."""
    A_reduced = row_echelon_form(A)
    rows, cols = A_reduced.shape
    # Identify pivot columns
    pivots = []
    for i in range(rows):
        for j in range(cols):
            if np.abs(A_reduced[i, j]) > 1e-5:
                pivots.append(j)
                break
    column_space = A[:, pivots]
    return column_space

2.2.5.3 Examples:

  1. Find all the fundamental subspaces of \(A=\begin{pmatrix}1&2&3\\ 4&5&6\\7&8&9\end{pmatrix}\).
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

print("Matrix A:")
print(A)

# Null Space
null_space_A = null_space_of_matrix(A)
print("\nNull Space of A:")
print(null_space_A)

# Row Space
row_space_A = row_space_of_matrix(A)
print("\nRow Space of A:")
print(row_space_A)

# Column Space
column_space_A = column_space_of_matrix(A)
print("\nColumn Space of A:")
print(column_space_A)
Matrix A:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Null Space of A:
[[-9.        ]
 [-1.71428571]
 [ 1.        ]]

Row Space of A:
[[7.00000000e+00 8.00000000e+00 9.00000000e+00]
 [0.00000000e+00 8.57142857e-01 1.71428571e+00]
 [0.00000000e+00 5.55111512e-17 1.11022302e-16]]

Column Space of A:
[[1 2]
 [4 5]
 [7 8]]

2.2.5.4 Rank and Solution of System of Linear Equations

In linear algebra, the rank of a matrix is a crucial concept for understanding the structure of a system of linear equations. It provides insight into the solutions of these systems, helping us determine the number of independent equations and the nature of the solution space.

Definition 2.2 (Rank and System Consistency) The rank of a matrix \(A\) is defined as the maximum number of linearly independent rows or columns. When solving a system of linear equations represented by \(A\mathbf{x} = \mathbf{b}\), where \(A\) is an \(m \times n\) matrix and \(\mathbf{b}\) is a vector, the rank of \(A\) plays a crucial role in determining the solution’s existence and uniqueness.

Consistency of the System

  1. Consistent System: A system of linear equations is consistent if there exists at least one solution. This occurs if the rank of the coefficient matrix \(A\) is equal to the rank of the augmented matrix \([A|\mathbf{b}]\). Mathematically, this can be expressed as: \[\text{rank}(A) = \text{rank}([A|\mathbf{b}])\] If this condition is met, the system has solutions. The solutions can be:
    • Unique if the rank equals the number of variables.
    • Infinitely many if the rank is less than the number of variables.
  2. Inconsistent System: A system is inconsistent if there are no solutions. This occurs when: \[\text{rank}(A) \ne \text{rank}([A|\mathbf{b}])\] In this case, the equations represent parallel or conflicting constraints that cannot be satisfied simultaneously.
Use of Null space in creation of general solution from particular solution

If the system \(AX=b\) has many solutions, then the general solution of the system can be found using a particular solution and the elements in the null space of the coefficient matrix \(A\) as

\[X=x_p+tX_N\]

where \(X\) is the general solution and \(t\) is a free variable (parameter) and \(X_N\in N(A)\).

2.2.5.5 Computational method to solve system of linear equations.

If for a system \(AX=b\), \(det(A)\neq 0\), then the system has a unique solution and can be found by solve() function from NumPy. If the system is consistant and many solutions, then computationally we will generate the general solution using the \(N(A)\). A detailed Python code is given below.

import numpy as np

def check_consistency(A, b):
    """
    Check the consistency of a linear system Ax = b and return the solution if consistent.
    
    Parameters:
    A (numpy.ndarray): Coefficient matrix.
    b (numpy.ndarray): Right-hand side vector.
    
    Returns:
    tuple: A tuple with consistency status, particular solution (if consistent), and null space (if infinite solutions).
    """
    A = np.array(A)
    b = np.array(b)
    
    # Augment the matrix A with vector b
    augmented_matrix = np.column_stack((A, b))
    
    # Compute ranks
    rank_A = np.linalg.matrix_rank(A)
    rank_augmented = np.linalg.matrix_rank(augmented_matrix)
    
    # Check for consistency
    if rank_A == rank_augmented:
        if rank_A == A.shape[1]:
            # Unique solution
            solution = np.linalg.solve(A, b)
            return "Consistent and has a unique solution", solution, None
        else:
            # Infinitely many solutions
            particular_solution = np.linalg.lstsq(A, b, rcond=None)[0]
            null_space = null_space_of_matrix(A)
            return "Consistent but has infinitely many solutions", particular_solution, null_space
    else:
        return "Inconsistent system (no solution)", None, None

def null_space_of_matrix(A):
    """
    Compute the null space of matrix A, which gives the set of solutions to Ax = 0.
    
    Parameters:
    A (numpy.ndarray): Coefficient matrix.
    
    Returns:
    numpy.ndarray: Basis for the null space of A.
    """
    u, s, vh = np.linalg.svd(A)
    null_mask = (s <= 1e-10)  # Singular values near zero
    null_space = np.compress(null_mask, vh, axis=0)
    return null_space.T

Example 1: Solve \[\begin{align*} 2x-y+z&=1\\ x+2y&=3\\ 3x+2y+z&=4 \end{align*}\]

# Example usage 1: System with a unique solution
A1 = np.array([[2, -1, 1], [1, 0, 2], [3, 2, 1]])
b1 = np.array([1, 3, 4])

status1, solution1, null_space1 = check_consistency(A1, b1)
print("Example 1 - Status:", status1)

if solution1 is not None:
    print("Solution:", solution1)
if null_space1 is not None:
    print("Null Space:", null_space1)
Example 1 - Status: Consistent and has a unique solution
Solution: [0.27272727 0.90909091 1.36363636]

Example 2: Solve the system of equations, \[\begin{align*} x+2y+z&=3\\ 2x+4y+2z&=6\\ x+y+z&=2 \end{align*}\]

# Example usage 2: System with infinitely many solutions
A2 = np.array([[1, 2, 1], [2, 4, 2], [1, 1, 1]])
b2 = np.array([3, 6, 2])

status2, solution2, null_space2 = check_consistency(A2, b2)
print("\nExample 2 - Status:", status2)

if solution2 is not None:
    print("Particular Solution:", solution2)
if null_space2 is not None:
    print("Null Space (Basis for infinite solutions):", null_space2)

Example 2 - Status: Consistent but has infinitely many solutions
Particular Solution: [0.5 1.  0.5]
Null Space (Basis for infinite solutions): [[ 7.07106781e-01]
 [ 1.11022302e-16]
 [-7.07106781e-01]]

2.3 Module review

  1. Define the Hadamard Product in Linear Algebra and Provide a Suitable Example.
    • Hint: Element-wise product of matrices of the same dimensions.
    • Example: For \(A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\) and \(B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}\), \(A \circ B = \begin{bmatrix} 5 & 12 \\ 21 & 32 \end{bmatrix}\).
  2. Give Two Applications of the Hadamard Product in Machine Learning.
    • Hint: Gradient updates and feature scaling in deep learning.
  3. Find the Outer Product of Two Vectors $S_1 = [1, 2, 7] $ and \(S_2 = \begin{bmatrix} 7 \\ 2 \\ 1 \end{bmatrix}\).
    • Hint: Compute \(S_1 \cdot S_2^T\).
  4. Define and Differentiate Between the Dot Product and the Outer Product of Two Vectors.
    • Hint: Dot product results in a scalar; outer product results in a matrix.
  5. Write a Pseudocode to Compute the Hadamard Product of Two Matrices.
    • Hint: Use nested loops for element-wise multiplication.
  6. Compute the Row Norms of Matrix \(A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\).
    • Hint: Use \(\|A_{i*}\| = \sqrt{\sum_j A_{ij}^2}\).
  7. Find the Column Norms of Matrix \(A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\).
    • Hint: Use \(\|A_{*j}\| = \sqrt{\sum_i A_{ij}^2}\).
  8. Compute the Frobenius Norm of Matrix \(A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\).
    • Hint: \(\|A\|_F = \sqrt{\sum_{i,j} A_{ij}^2}\).
  9. Prove the Hadamard Product Is Commutative and Associative.
    • Hint: Based on properties of element-wise operations.
  10. Explain the Use of the Hadamard Product in Convolutional Neural Networks.
    • Hint: Used in depth-wise convolutions and attention mechanisms.
  11. For Vectors \(u = [1, 2, 3]\) and \(v = [4, 5, 6]\), Compute \(u \circ v\) and \(u \cdot v\).
    • Hint: Hadamard product is element-wise, dot product is summation.
  12. Write Pseudocode for Outer Product of Two Vectors.
    • Hint: Compute \(u_i \cdot v_j\) for all \(i, j\).
  13. Discuss the Role of the Outer Product in Tensor Decomposition.
    • Hint: Used to represent rank-1 tensors.
  14. Find Hadamard Product of \(A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\) and \(B = \begin{bmatrix} 2 & 0 \\ 1 & 5 \end{bmatrix}\).
    • Hint: Multiply corresponding elements.
  15. Use Python to Compute Outer Product of \(x = [1, 2]\) and \(y = [3, 4]\).
    • Hint: Use NumPy’s np.outer(x, y).
  16. Write all the fundamental subspaces of \(A = \begin{bmatrix} 1 & 2 & 5 \\ 3 & 4 & 10 \\ 5 & 6 & 16 \end{bmatrix}\).
    • Hint: Determine the column space, null space, row space, and left null space using Gaussian elimination and rank of \(A\).
  17. In a recommendation system, the user preference is \(u = [4, 3, 5]\) and the item score is \(v = [2, 5, 4]\). Find the user-item interaction score and its Frobenius norm.
    • Hint: Compute the outer product \(u \cdot v^T\) and the Frobenius norm \(\|u \cdot v^T\|_F = \sqrt{\sum u_iv_j^2}\).
  18. Verify the rank-nullity theorem for \(A = \begin{bmatrix} 7 & -3 & 5 \\ 9 & 11 & 2 \\ 16 & 8 & 7 \end{bmatrix}\).
    • Hint: Compute the rank of \(A\), the dimension of its null space, and verify \(\text{rank}(A) + \text{nullity}(A) = \text{number of columns of } A\).
  19. Define the Kronecker product of two matrices. Find the Kronecker product of \(A = \begin{bmatrix} -1 & 1 \\ 1 & -1 \end{bmatrix}\) and \(B = \begin{bmatrix} 1 & 0 & 1 & 8 \\ 2 & -2 & 2 & -2 \\ -4 & 0 & 3 & -1 \end{bmatrix}\) in block matrix form.
    • Hint: Use the definition \(A \otimes B = \begin{bmatrix} a_{11}B & a_{12}B \\ a_{21}B & a_{22}B \end{bmatrix}\).
  20. Explain and compute the rank of the Kronecker product of two matrices $A $ and \(B\), where $A $ is \(2 \times 2\) and $B $ is \(3 \times 3\).
    • Hint: Use the property \(\text{rank}(A \otimes B) = \text{rank}(A) \cdot \text{rank}(B)\).
  21. Find the row space and column space of \(A = \begin{bmatrix} 2 & 4 & 6 \\ 1 & 2 & 3 \\ 0 & 0 & 0 \end{bmatrix}\).
    • Hint: Row space is spanned by independent rows; column space is spanned by independent columns.
  22. Determine if \(A = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}\) is invertible. If not, explain why using its fundamental subspaces.
    • Hint: Check if the null space is trivial or if the determinant of \(A\) is zero.
  23. Write a pseudocode to calculate the Frobenius norm of any \(m \times n\) matrix.
    • Hint: Use \(\|A\|_F = \sqrt{\sum_{i=1}^m \sum_{j=1}^n A_{ij}^2}\).
  24. Find the projection of \(b = \begin{bmatrix} 3 \\ 4 \\ 5 \end{bmatrix}\) onto the column space of \(A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix}\).
    • Hint: Use \(P = A(A^T A)^{-1}A^T b\).
  25. Write a pseudocode to compute the outer product of two vectors \(x\) and \(y\).
    • Hint: Use a nested loop or NumPy’s np.outer() function to compute the product.

  1. A regularization techniques in Deep learning. This approach deactivate some selected neurons to control model over-fitting↩︎

  2. Remember that the covariance of \(X\) is defined as \(Cov(X)=\dfrac{\sum (X-\bar{X})^2}{n-1}\)↩︎