Least Squares

전체 글

(1)

Least Squares

Wanho Choi

(wanochoi.com)

(2)

Linear Least Squares Problem

=

Ax

= b

m

> n

(# of equations) > (# of unknowns)

over-determined system

A : m

× n, x :n ×1, b : m ×1

(

)

A x

b

No Exact Solution

(3)

The Nearest Solution

Ax

≈ b

⇔ argmin

x

Ax

− b

2

residual: r

= Ax − b

A : m

× n, x :n ×1, b : m ×1

(

)

(4)

How to Solve

Normal equations

QR decomposition

SVD

(Singular Value Decomposition)

faster cheaper

more accurate more reliable

(5)

Normal Equation

r

22

= r

T

r

= Ax − b

(

)

T

(

Ax

− b

)

= x

(

T

A

T

− b

T

)

(

Ax

− b

)

= x

T

A

T

Ax

− x

T

A

T

b

− b

T

Ax

+ b

T

b

= x

T

A

T

Ax

− x

T

A

T

b

− x

T

A

T

b

+ b

T

b

(

∵b

T

Ax: scalar

)

= x

T

A

T

Ax

− 2x

T

A

T

b

+ b

T

b

∂x

r

2 2

= 2A

T

Ax

− 2A

T

b

= 0

∴A

T

Ax

= A

T

b

: normal equations

(6)

Geometric Derivation

Ay

( )

⋅ b − Ax

(

)

= Ay

( )

T b − Ax

(

)

= yT AT

(

b − Ax

)

= 0 b p = Ax r = b − Ax Ay= a11 a12 ! a1n a21 a22 ! a2n ! ! " ! am1 am 2 ! amn ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ y1 y2 ! yn ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ = y1 a11 a21 ! am1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ + y2 a12 a22 ! am 2 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ +!+ yn a1n a2n ! amn ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

Ay : the column space of A

∴A

T

b

− Ax

(

)

= 0 ⇔ A

T

Ax

= A

T

b

r is minimized when y=x

(7)

Pseudo Inverse

p

= A A

( )

T

A

−1

A

T

b

A

x

P

A

T

Ax

= A

T

b

A

T

A

( )

−1

A

T

A

( )

x

= A

( )

T

A

−1

A

T

b

x

= A

( )

T

A

−1

A

T

b

p

= Ax = A A

( )

T

A

−1

A

T

b

b

pseudo inverse

(8)

Condition Number of a Function

It measures

• How much the output value changes for small change of input

• How sensitive to changes or errors in the input

• How much error in the output results from an error in the input

cond(A)

=

κ

=

σ

max

(A)

σ

min

(A)

σi(A) : the i-th singular value of A

Δx

x

≤ cond(A)

ΔA

(9)

Condition Number of a Function

Well-conditioned: low condition number

Ill-conditioned: high condition number

The normal equations are sometimes ill-conditioned.

A

= 1+10

−10

−1

−1

1

A

T

A

= 2 + 2 ⋅10

−10

+10

−20

−2 −10

−10

−2 −10

−10

2

cond(A

T

A)

= cond(A)

2

(10)

QR Decomposition

∵Orthogonal transformation preserve the Euclidean norm.

( ) ∵A=QU ( )

r

22

= Ax − b

22

= QUx − b

22

= Q

T

QUx

− Q

T

b

2 2

= Ux − Q

T

b

2 2

∴Ux = Q

T

b

U

=

* *

! *

0 *

! *

! ! " !

0 0

! *

0 0

! 0

! ! " !

0 0

! 0

r

m

n

(11)

Singular Value Decomposition

1 1 0 1 −1 1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ = 0.5774 0.7071 0.4082 0.5774 0 −0.8165 0.5774 −0.7071 0.4082 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ 1.7321 0 0 1.4142 0 0 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ 0 1 1 0 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

A

= USV

T

m × m m × n m × n n × n ex)

(12)

Singular Value Decomposition

A

= USV

T

m × m m × n m × n n × n S = D 0 0 0 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = σ1 0 ! 0 0 σ2 ! 0 " " # " 0 0 ! σr ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥

σi(A) : the i-th singular value of A

i = 0,1,!,r = min(m,n)

r = rank(A) if A is full rank

(13)

Singular Value Decomposition

A

= USV

T

m × m

m × n m × n n × n

S : singular values of A eigenvalues of AA

(

Tor ATA

)

U : left-hand singular vectors of A eigenvectors of AA

(

T

)

V : right-hand singular vectors of A eigenvectors of A

(

TA

)

AAT = USV

(

T

)

(

USVT

)

T = USV

(

T

)

(

VSTUT

)

= USVTVSTUT = US2UT ATA = USV

(

T

)

T

(

USVT

)

= VS

(

TUT

)

(

USVT

)

= VSTUTUSVT = VS2VT λ1 ≥ λ2 ≥! ≥ λr = λr+1 = ! = λn = 0

(14)

EVD vs SVD

Singular Value Decomposition lets us

decompose any matrix A as a  UΣVT where U and V are

orthogonal and Σ is a diagonal matrix whose non-zero entries are square roots of the eigenvalues of ATA. The columns

of U and V give bases for the four fundamental subspaces. If A is symmetric, A is diagonalizable. In other words, if A is symmetric, we can find an orthonormal matrix Q so

that A=QΛQT. Here Λ is a diagonal matrix that has eigenvalues

of A. So, it is called eigenvalue decomposition.

Furthermore, when A is symmetric and positive semi-definite: (i.e. A=QΛQT=QΛ1/2Λ1/2QT (∵∀𝝀≥0) =QΛ1/2QTQΛ1/2QT=A1/2A1/2)

ATA=(QΛQT)TQΛQT=QΛQTQΛQT=QΛ2QT

AAT=QΛQT(QΛQT)T=QΛQTQΛQT=QΛ2QT

(15)

Pseudo Inverse

A

+

= A

( )

T

A

−1

A

T

= VS

−1

U

T

A

= USV

T

m × m

m × n m × n n × n

from normal equations ATAx = ATb

∴x = AT A

( )

−1 ATb ATA

( )

−1 AT = USV

(

(

T

)

T

(

USVT

)

)

−1 USVT

(

)

T = VST UTUSVT

(

)

−1 VSTUT

(

)

= VS

(

2VT

)

−1

(

VSUT

)

= V−TS−2V−1VSUT = VS−1UT

(16)

Comparison

The standard recommendation for linear

least-squares is to use QR factorization.

These days QR seems to be the consensus for most

practical problems.

Cholesky is cheaper but suffers from "conditioning

squared" due to the fact that you have to operate on

(X′X)(X′X), not XX.

SVD has some numerical advantages, but is more

costly than QR decomposition.

(17)

Eigen3 Example

2x − y = 2 2x + y = 2 x − 3y = −3 2 −1 1 −3 2 1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ x y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ = 2 −3 2 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥

(18)

Eigen3 Example

#include <iostream> #include <Eigen/Dense>

int main( int argc, char* argv[] ) { Eigen::MatrixXd A( 3, 2 ); A(0,0) = 2; A(0,1) = -1; A(1,0) = 1; A(1,1) = -3; A(2,0) = 2; A(2,1) = 1; Eigen::VectorXd b ( 3 ); b(0) = 2; b(1) = -3; b(2) = 2; Eigen::VectorXd x( 2 );

x = ( A.transpose() * A ).ldlt().solve( A.transpose() * b ); // normal equations x = A.colPivHouseholderQr().solve( b ); // QR decomposition

x = A.jacobiSvd( Eigen::ComputeThinU | Eigen::ComputeThinV ).solve( b ); // SVD std::cout << x << std::endl;

return 0; }

(19)

Eigen3 Example

2x − y = 2 2x + y = 2 x − 3y = −3 2 −1 1 −3 2 1 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ x y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ = 2 −3 2 ⎡ ⎣ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ x y ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ = 0.911111.06667 ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

(20)

수치

Updating...

참조

  1. wanochoi.com)
관련 주제 :