Least Squares

(1)

Least Squares

Wanho Choi

(wanochoi.com)

(2)

Linear Least Squares Problem

=

Ax

= b

m

> n

(# of equations) > (# of unknowns)

over-determined system

A : m

× n, x :n ×1, b : m ×1

(

)

A x

b

No Exact Solution

(3)

The Nearest Solution

Ax

≈ b

⇔ argmin

x

Ax

− b

₂

residual: r

= Ax − b

A : m

× n, x :n ×1, b : m ×1

(

)

(4)

How to Solve

Normal equations

QR decomposition

SVD

(Singular Value Decomposition)

faster cheaper

more accurate more reliable

(5)

Normal Equation

r

₂2

= r

T

r

= Ax − b

(

)

T

(

Ax

− b

)

= x

(

T

A

T

− b

T

)

(

Ax

− b

)

= x

T

A

T

Ax

− x

T

A

T

b

− b

T

Ax

+ b

T

b

= x

T

A

T

Ax

− x

T

A

T

b

− x

T

A

T

b

+ b

T

b

(

∵b

T

Ax: scalar

)

= x

T

A

T

Ax

− 2x

T

A

T

b

+ b

T

b

∂

∂x

r

2 2

= 2A

T

Ax

− 2A

T

b

= 0

∴A

T

Ax

= A

T

b

: normal equations

(6)

Geometric Derivation

Ay

( )

⋅ b − Ax

(

)

= Ay

( )

T b − Ax

(

)

= yT AT

(

b − Ax

)

= 0 b p = Ax r = b − Ax Ay= a11 a12 ! a1n a21 a22 ! a2n ! ! " ! a_m1 a_{m 2} ! amn ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ y1 y2 ! y_n ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ = y1 a11 a21 ! a_m1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ + y2 a12 a22 ! a_{m 2} ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ +!+ yn a1n a2n ! a_mn ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

Ay : the column space of A

∴A

T

b

− Ax

(

)

= 0 ⇔ A

T

Ax

= A

T

b

r is minimized when y=x

(7)

Pseudo Inverse

p

= A A

( )

T

A

−1

A

T

b

A

x

P

A

T

Ax

= A

T

b

A

T

A

( )

−1

A

T

A

( )

x

= A

( )

T

A

−1

A

T

b

x

= A

( )

T

A

−1

A

T

b

p

= Ax = A A

( )

T

A

−1

A

T

b

pseudo inverse

(8)

Condition Number of a Function

It measures

• How much the output value changes for small change of input

• How sensitive to changes or errors in the input

• How much error in the output results from an error in the input

cond(A)

=

κ

=

σ

max

(A)

σ

_min

(A)

σi(A) : the i-th singular value of A

Δx

x

≤ cond(A)

ΔA

(9)

Condition Number of a Function

• Well-conditioned: low condition number

• _{Ill-conditioned: high condition number}

• _{The normal equations are sometimes ill-conditioned.}

A

= 1+10

−10

−1

1 ⎡

⎣

⎢

⎤

⎦

⎥

A

T

A

= 2 + 2 ⋅10

−10

₊₁₀

−20

_{−2 −10}

−10

−2 −10

−10

2 ⎡

⎣

⎢

⎤

⎦

⎥

cond(A

T

A)

= cond(A)

2

(10)

QR Decomposition

∵Orthogonal transformation preserve the Euclidean norm.

( ) ∵A=QU ( )

r

₂2

= Ax − b

₂2

= QUx − b

₂2

= Q

T

QUx

− Q

T

b

2 2

= Ux − Q

T

b

2 2

∴Ux = Q

T

b

U

=

* *

! *

0 *

! *

! ! " !

0 0

! *

0 0

! 0

! ! " !

0 0

! 0

⎡

⎣

⎢

⎤

⎦

⎥

r

m

n

(11)

Singular Value Decomposition

1 1 0 1 −1 1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ = 0.5774 0.7071 0.4082 0.5774 0 −0.8165 0.5774 −0.7071 0.4082 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ 1.7321 0 0 1.4142 0 0 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ 0 1 1 0 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

A

= USV

T

m × m m × n m × n n × n ex)

(12)

Singular Value Decomposition

A

= USV

T

m × m m × n m × n n × n S = D 0 0 0 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = σ₁ 0 ! 0 0 σ₂ ! 0 " " # " 0 0 ! σ_r ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

σi(A) : the i-th singular value of A

i = 0,1,!,r = min(m,n)

r = rank(A) if A is full rank

(13)

Singular Value Decomposition

A

= USV

T

m × m

m × n m × n n × n

S : singular values of A eigenvalues of AA

(

Tor ATA

)

U : left-hand singular vectors of A eigenvectors of AA

(

T

)

V : right-hand singular vectors of A eigenvectors of A

(

TA

)

AAT = USV

(

T

)

(

USVT

)

T = USV

(

T

)

(

VSTUT

)

= USVTVSTUT = US2UT ATA = USV

(

T

)

T

(

USVT

)

= VS

(

TUT

)

(

USVT

)

= VSTUTUSVT = VS2VT λ1 ≥ λ2 ≥! ≥ λr = λr+1 = ! = λn = 0

(14)

EVD vs SVD

Singular Value Decomposition lets us

decompose any matrix A as a UΣVT where U and V are

orthogonal and Σ is a diagonal matrix whose non-zero entries are square roots of the eigenvalues of ATA. The columns

of U and V give bases for the four fundamental subspaces. If A is symmetric, A is diagonalizable. In other words, if A is symmetric, we can find an orthonormal matrix Q so

that A=QΛQT_{. Here}Λ is a diagonal matrix that has eigenvalues

of A. So, it is called eigenvalue decomposition.

Furthermore, when A is symmetric and positive semi-definite: (i.e. A=QΛQT₌QΛ1/2Λ1/2QT _(∵∀𝝀≥0) =QΛ1/2QTQΛ1/2QT=A1/2A1/2)

ATA=(QΛQT)TQΛQT=QΛQTQΛQT=QΛ2QT

AAT=QΛQT(QΛQT)T=QΛQTQΛQT=QΛ2QT

(15)

Pseudo Inverse

A

+

= A

( )

T

A

−1

A

T

= VS

−1

U

T

A

= USV

T

m × m

m × n m × n n × n

from normal equations ATAx = ATb

∴x = AT A

( )

−1 ATb ATA

( )

−1 AT = USV

(

T

)

T

(

USVT

)

−1 USVT

(

)

T = VST UTUSVT

(

)

−1 VSTUT

(

)

= VS

(

2VT

)

−1

(

VSUT

)

= V−TS−2V−1VSUT = VS−1UT

(16)

Comparison

• The standard recommendation for linear

least-squares is to use QR factorization.

• These days QR seems to be the consensus for most

practical problems.

• Cholesky is cheaper but suffers from "conditioning

squared" due to the fact that you have to operate on

(X′X)(X′X), not XX.

• SVD has some numerical advantages, but is more

costly than QR decomposition.

(17)

Eigen3 Example

2x − y = 2 2x + y = 2 x − 3y = −3 2 −1 1 −3 2 1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ x y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ = 2 −3 2 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥

(18)

Eigen3 Example

#include <iostream> #include <Eigen/Dense>

int main( int argc, char* argv[] ) { Eigen::MatrixXd A( 3, 2 ); A(0,0) = 2; A(0,1) = -1; A(1,0) = 1; A(1,1) = -3; A(2,0) = 2; A(2,1) = 1; Eigen::VectorXd b ( 3 ); b(0) = 2; b(1) = -3; b(2) = 2; Eigen::VectorXd x( 2 );

x = ( A.transpose() * A ).ldlt().solve( A.transpose() * b ); // normal equations x = A.colPivHouseholderQr().solve( b ); // QR decomposition

x = A.jacobiSvd( Eigen::ComputeThinU | Eigen::ComputeThinV ).solve( b ); // SVD std::cout << x << std::endl;

return 0; }

(19)

Eigen3 Example

2x − y = 2 2x + y = 2 x − 3y = −3 2 −1 1 −3 2 1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ x y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ = 2 −3 2 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ x y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ = 0.911111.06667 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

(20)