https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse
In mathematics, and in particular linear algebra, a pseudoinverse A+ of a matrix A is a generalization of the inverse matrix.[1] The most widely known type of matrix pseudoinverse is the Moore–Penrose inverse,[2][3][4][5] which was independently described by E. H. Moore[6] in 1920, Arne Bjerhammar[7] in 1951, and Roger Penrose[8] in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. When referring to a matrix, the term pseudoinverse, without further specification, is often used to indicate the Moore–Penrose inverse. The term generalized inverse is sometimes used as a synonym for pseudoinverse.
A common use of the pseudoinverse is to compute a 'best fit' (least squares) solution to a system of linear equations that lacks a unique solution (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra.
The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition.
Contents
[hide]
o 3.3Reduction to Hermitian case
o 5.3Linearly independent columns
o 5.4Linearly independent rows
o 5.5Orthonormal columns or rows
o 5.6Orthogonal projection matrices
o 6.3Singular value decomposition (SVD)
o 6.5The iterative method of Ben-Israel and Cohen
o 6.6Updating the pseudoinverse
o 7.2Obtaining all solutions of a linear system
o 7.3Minimum norm solution to a linear system
· 10Notes
Notation[edit]
In the following discussion, the following conventions are adopted.
· {\displaystyle K} will denote one of the fields of real or complex numbers, denoted {\displaystyle \mathbb {R} }, {\displaystyle \mathbb {C} }, respectively. The vector space of {\displaystyle m\times n} matrices over {\displaystyle K} is denoted by {\displaystyle \mathrm {M} (m,n;K)}.
· For {\displaystyle A\in \mathrm {M} (m,n;K)}, {\displaystyle A^{\mathrm {T} }} and {\displaystyle A^{*}} denote the transpose and Hermitian transpose (also called conjugate transpose) respectively. If {\displaystyle K=\mathbb {R} }, then {\displaystyle A^{*}=A^{\mathrm {T} }}.
· For {\displaystyle A\in \mathrm {M} (m,n;K)}, {\displaystyle \operatorname {im} (A)} denotes the range (image) of {\displaystyle A} (the space spanned by the column vectors of {\displaystyle A}) and {\displaystyle \operatorname {ker} (A)} denotes the kernel (null space) of {\displaystyle A}.
· Finally, for any positive integer {\displaystyle n}, {\displaystyle I_{n}\in \mathrm {M} (n,n;K)} denotes the {\displaystyle n\times n} identity matrix.
Definition[edit]
For {\displaystyle A\in \mathrm {M} (m,n;K)}, a pseudoinverse of {\displaystyle A} is defined as a matrix {\displaystyle A^{+}\in \mathrm {M} (n,m;K)} satisfying all of the following four criteria, known as the Moore-Penrose conditions:[8][9]
1. {\displaystyle AA^{+}A=A\,\!} (AA+ need not be the general identity matrix, but it maps all column vectors of Ato themselves);
2. {\displaystyle A^{+}AA^{+}=A^{+}\,\!} (A+ is a weak inverse for the multiplicative semigroup);
3. {\displaystyle (AA^{+})^{*}=AA^{+}\,\!} (AA+ is Hermitian); and
4. {\displaystyle (A^{+}A)^{*}=A^{+}A\,\!} (A+A is also Hermitian).
{\displaystyle A^{+}} exists for any matrix {\displaystyle A}, but when the latter has full rank, {\displaystyle A^{+}} can be expressed as a simple algebraic formula.
In particular, when {\displaystyle A} has linearly independent columns (and thus matrix {\displaystyle A^{*}A} is invertible), {\displaystyle A^{+}} can be computed as:
{\displaystyle A^{+}=(A^{*}A)^{-1}A^{*}\,.}
This particular pseudoinverse constitutes a left inverse, since, in this case, {\displaystyle A^{+}A=I}.
When {\displaystyle A} has linearly independent rows (matrix {\displaystyle AA^{*}} is invertible), {\displaystyle A^{+}} can be computed as:
{\displaystyle A^{+}=A^{*}(AA^{*})^{-1}\,.}
This is a right inverse, as {\displaystyle AA^{+}=I}.
Properties[edit]
Proofs for some of these facts may be found on a separate page, Proofs involving the Moore–Penrose inverse.
Existence and uniqueness[edit]
· The pseudoinverse exists and is unique: for any matrix {\displaystyle A\,\!}, there is precisely one matrix {\displaystyle A^{+}\,\!}, that satisfies the four properties of the definition.[9]
A matrix satisfying the first condition of the definition is known as a generalized inverse. If the matrix also satisfies the second definition, it is called a generalized reflexive inverse. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.
Basic properties[edit]
· If {\displaystyle A\,\!} has real entries, then so does {\displaystyle A^{+}\,\!}.
· If {\displaystyle A\,\!} is invertible, its pseudoinverse is its inverse. That is: {\displaystyle A^{+}=A^{-1}\,\!}.[10]:243
· The pseudoinverse of a zero matrix is its transpose.
· The pseudoinverse of the pseudoinverse is the original matrix: {\displaystyle (A^{+})^{+}=A\,\!}.[10]:245
· Pseudoinversion commutes with transposition, conjugation, and taking the conjugate transpose:[10]:245
{\displaystyle (A^{\mathrm {T} })^{+}=(A^{+})^{\mathrm {T} },~~(\,{\overline {A}}\,)^{+}={\overline {A^{+}}},~~(A^{*})^{+}=(A^{+})^{*}.\,\!}
· The pseudoinverse of a scalar multiple of A is the reciprocal multiple of A+:
{\displaystyle (\alpha A)^{+}=\alpha ^{-1}A^{+}\,\!} for {\displaystyle \alpha \neq 0.}
Identities[edit]
The following identities can be used to cancel certain subexpressions or expand expressions involving pseudoinverses. Proofs for these properties can be found in the proofs subpage.
{\displaystyle {\begin{array}{lclll}A^{+}&=&A^{+}&A^{+*}&A^{*}\\A^{+}&=&A^{*}&A^{+*}&A^{+}\\A&=&A^{+*}&A^{*}&A\\A&=&A&A^{*}&A^{+*}\\A^{*}&=&A^{*}&A&A^{+}\\A^{*}&=&A^{+}&A&A^{*}\\\end{array}}}
Reduction to Hermitian case[edit]
The computation of the pseudoinverse is reducible to its construction in the Hermitian case. This is possible through the equivalences:
· {\displaystyle A^{+}=(A^{*}A)^{+}A^{*}\,\!}
· {\displaystyle A^{+}=A^{*}(AA^{*})^{+}\,\!}
as {\displaystyle A^{*}A} and {\displaystyle AA^{*}} are Hermitian.
Products[edit]
If {\displaystyle A\in \mathrm {M} (m,n;K),~B\in \mathrm {M} (n,p;K)\,}, and if
· {\displaystyle A\,\!} has orthonormal columns (i.e., {\displaystyle A^{*}A=I_{n}\,}), or
· {\displaystyle B\,\!} has orthonormal rows (i.e., {\displaystyle BB^{*}=I_{n}\,}), or
· {\displaystyle A\,\!} has all columns linearly independent (full column rank) and {\displaystyle B\,\!} has all rows linearly independent (full row rank), or
· {\displaystyle B=A^{*}\,\!} (i.e., {\displaystyle B} is the conjugate transpose of {\displaystyle A}),
then
{\displaystyle (AB)^{+}\equiv B^{+}A^{+}\,\!}.
The last property yields the equivalences:
{\displaystyle {\begin{aligned}(AA^{*})^{+}&\equiv A^{+*}A^{+}\\(A^{*}A)^{+}&\equiv A^{+}A^{+*}\end{aligned}}}
Projectors[edit]
{\displaystyle P=AA^{+}\,\!} and {\displaystyle Q=A^{+}A\,\!} are orthogonal projection operators – that is, they are Hermitian ({\displaystyle P=P^{*}\,\!}, {\displaystyle Q=Q^{*}\,\!}) and idempotent ({\displaystyle P^{2}=P\,\!} and {\displaystyle Q^{2}=Q\,\!}). The following hold:
· {\displaystyle PA=AQ=A\,\!} and {\displaystyle A^{+}P=QA^{+}=A^{+}\,\!}
· {\displaystyle P\,\!} is the orthogonal projector onto the range of {\displaystyle A\,\!} (which equals the orthogonal complement of the kernel of {\displaystyle A^{*}\,\!}).
· {\displaystyle Q\,\!} is the orthogonal projector onto the range of {\displaystyle A^{*}\,\!} (which equals the orthogonal complement of the kernel of {\displaystyle A\,\!}).
· {\displaystyle (I-Q)=(I-A^{+}A)\,\!} is the orthogonal projector onto the kernel of {\displaystyle A\,\!}.
· {\displaystyle (I-P)=(I-AA^{+})\,\!} is the orthogonal projector onto the kernel of {\displaystyle A^{*}\,\!}.[9]
The last two properties imply the following identities:
· {\displaystyle A\,\ (I-A^{+}A)=(I-AA^{+})A\ \ =0}
· {\displaystyle A^{*}(I-AA^{+})=(I-A^{+}A)A^{*}=0}
Another property is the following: if {\displaystyle A\in R^{n\times n}} is Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrix {\displaystyle B\in R^{m\times n}} the following equation holds:[11]
{\displaystyle A(BA)^{+}=(BA)^{+}}
This can be proven by defining matrices {\displaystyle C=BA}, {\displaystyle D=A(BA)^{+}}, and checking that {\displaystyle D} is indeed a pseudoinverse for {\displaystyle C} by verifying that the defining properties of the pseudoinverse hold, when {\displaystyle A} is Hermitian and idempotent.
From the last property it follows that, if {\displaystyle A\in R^{n\times n}} is Hermitian and idempotent, for any matrix {\displaystyle B\in R^{n\times m}}
{\displaystyle (AB)^{+}A=(AB)^{+}}
Finally, it should be noted that if {\displaystyle A} is an orthogonal projection matrix, then its pseudoinverse trivially coincides with the matrix itself, i.e. {\displaystyle A^{+}=A\,\!}.
Geometric construction[edit]
If we view the matrix as a linear map {\displaystyle A:K^{n}\to K^{m}} over a field {\displaystyle K} then {\displaystyle A^{+}:K^{m}\to K^{n}} can be decomposed as follows. We write {\displaystyle \oplus } for the direct sum, {\displaystyle \perp } for the orthogonal complement, {\displaystyle \operatorname {ker} } for the kernel of a map, and {\displaystyle \operatorname {ran} } for the image of a map. Notice that {\displaystyle K^{n}=(\operatorname {ker} A)^{\perp }\oplus \operatorname {ker} A} and {\displaystyle K^{m}=\operatorname {ran} A\oplus (\operatorname {ran} A)^{\perp }}. The restriction {\displaystyle A:(\operatorname {ker} A)^{\perp }\to \operatorname {ran} A} is then an isomorphism. These imply that {\displaystyle A^{+}} is defined on {\displaystyle \operatorname {ran} A} to be the inverse of this isomorphism, and on {\displaystyle (\operatorname {ran} A)^{\perp }} to be zero.
In other words: To find {\displaystyle A^{+}b} for given b in Km, first project b orthogonally onto the range of A, finding a point p(b) in the range. Then form A−1({p(b)}), i.e. find those vectors in Kn that A sends to p(b). This will be an affine subspace of Kn parallel to the kernel of A. The element of this subspace that has the smallest length (i.e. is closest to the origin) is the answer {\displaystyle A^{+}b} we are looking for. It can be found by taking an arbitrary member of A−1({p(b)}) and projecting it orthogonally onto the orthogonal complement of the kernel of A.
This description is closely related to the Minimum norm solution to a linear system.
Subspaces[edit]
{\displaystyle {\begin{aligned}\operatorname {ker} (A^{+})&=\operatorname {ker} (A^{*})\\\operatorname {ran} (A^{+})&=\operatorname {ran} (A^{*})\end{aligned}}}
Limit relations[edit]
· The pseudoinverse are limits:
{\displaystyle A^{+}=\lim _{\delta \searrow 0}(A^{*}A+\delta I)^{-1}A^{*}=\lim _{\delta \searrow 0}A^{*}(AA^{*}+\delta I)^{-1}}
(see Tikhonov regularization). These limits exist even if {\displaystyle (AA^{*})^{-1}\,\!} or {\displaystyle (A^{*}A)^{-1}\,\!} do not exist.[9]:263
Continuity[edit]
· In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not continuous: if the sequence {\displaystyle (A_{n})} converges to the matrix A (in the maximum norm or Frobenius norm, say), then (An)+need not converge to A+. However, if all the matrices have the same rank, (An)+ will converge to A+.[12]
Derivative[edit]
The derivative of a real valued pseudoinverse matrix which has constant rank at a point {\displaystyle x} may be calculated in terms of the derivative of the original matrix:[13]
{\displaystyle {\frac {\mathrm {d} }{\mathrm {d} x}}A^{+}(x)=-A^{+}\left({\frac {\mathrm {d} }{\mathrm {d} x}}A\right)A^{+}~+~A^{+}A^{+{\text{T}}}\left({\frac {\mathrm {d} }{\mathrm {d} x}}A^{\text{T}}\right)\left(I-AA^{+}\right)~+~\left(I-A^{+}A\right)\left({\frac {\text{d}}{{\text{d}}x}}A^{\text{T}}\right)A^{+{\text{T}}}A^{+}}
Examples[edit]
Since for invertible matrices the pseudoinverse equals the usual inverse, only examples of non-invertible matrices are considered below.
· For {\displaystyle \mathbf {A} ={\begin{pmatrix}0&0\\0&0\end{pmatrix}},} the pseudoinverse is {\displaystyle \mathbf {A^{+}} ={\begin{pmatrix}0&0\\0&0\end{pmatrix}}.} (Generally, the pseudoinverse of a zero matrix is its transpose.) The uniqueness of this pseudoinverse can be seen from the requirement {\displaystyle A^{+}=A^{+}AA^{+}}, since multiplication by a zero matrix would always produce a zero matrix.
· For {\displaystyle \mathbf {A} ={\begin{pmatrix}1&0\\1&0\end{pmatrix}},} the pseudoinverse is {\displaystyle \mathbf {A^{+}} ={\begin{pmatrix}{\tfrac {1}{2}}&{\tfrac {1}{2}}\\0&0\end{pmatrix}}.}
Indeed, {\displaystyle \mathbf {A\,A^{+}} ={\begin{pmatrix}{\tfrac {1}{2}}&{\tfrac {1}{2}}\\{\tfrac {1}{2}}&{\tfrac {1}{2}}\end{pmatrix}},} and thus {\displaystyle \mathbf {A\,A^{+}A} ={\begin{pmatrix}1&0\\1&0\end{pmatrix}}=A.}
Similarly, {\displaystyle \mathbf {A^{+}A} ={\begin{pmatrix}1&0\\0&0\end{pmatrix}},} and thus {\displaystyle \mathbf {A^{+}A\,A^{+}} ={\begin{pmatrix}{\tfrac {1}{2}}&{\tfrac {1}{2}}\\0&0\end{pmatrix}}=A^{+}.}
· For {\displaystyle \mathbf {A} ={\begin{pmatrix}1&0\\-1&0\end{pmatrix}},} {\displaystyle \mathbf {A^{+}} ={\begin{pmatrix}{\tfrac {1}{2}}&-{\tfrac {1}{2}}\\0&0\end{pmatrix}}.}
· For {\displaystyle \mathbf {A} ={\begin{pmatrix}1&0\\2&0\end{pmatrix}},} {\displaystyle \mathbf {A^{+}} ={\begin{pmatrix}{\tfrac {1}{5}}&{\tfrac {2}{5}}\\0&0\end{pmatrix}}.} (The denominators are {\displaystyle 5=1^{2}+2^{2}}.)
· For {\displaystyle \mathbf {A} ={\begin{pmatrix}1&1\\1&1\end{pmatrix}},} {\displaystyle \mathbf {A^{+}} ={\begin{pmatrix}{\tfrac {1}{4}}&{\tfrac {1}{4}}\\{\tfrac {1}{4}}&{\tfrac {1}{4}}\end{pmatrix}}.}
· For {\displaystyle \mathbf {A} ={\begin{pmatrix}1&0\\0&1\\0&1\end{pmatrix}},} the pseudoinverse is {\displaystyle \mathbf {A^{+}} ={\begin{pmatrix}1&0&0\\0&{\tfrac {1}{2}}&{\tfrac {1}{2}}\end{pmatrix}}.}
Note that for this matrix, the left inverse exists and thus equals {\displaystyle \mathbf {A^{+}} }, indeed, {\displaystyle \mathbf {A^{+}A} ={\begin{pmatrix}1&0\\0&1\end{pmatrix}}.}
Special cases[edit]
Scalars[edit]
It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar x is zero if x is zero and the reciprocal of x otherwise:
{\displaystyle x^{+}=\left\{{\begin{matrix}0,&{\mbox{if }}x=0;\\x^{-1},&{\mbox{otherwise}}.\end{matrix}}\right.}
Vectors[edit]
The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:
{\displaystyle x^{+}=\left\{{\begin{matrix}0^{\mathrm {T} },&{\mbox{if }}x=0;\\{x^{*} \over x^{*}x},&{\mbox{otherwise}}.\end{matrix}}\right.}
Linearly independent columns[edit]
If the columns of {\displaystyle A\,\!} are linearly independent (so that {\displaystyle m\geq n}), then {\displaystyle A^{*}A\,\!} is invertible. In this case, an explicit formula is:[1]
{\displaystyle A^{+}=(A^{*}A)^{-1}A^{*}\,\!}.
It follows that {\displaystyle A^{+}\,\!} is then a left inverse of {\displaystyle A\,\!}: {\displaystyle A^{+}A=I_{n}\,\!}.
Linearly independent rows[edit]
If the rows of {\displaystyle A\,\!} are linearly independent (so that {\displaystyle m\leq n}), then {\displaystyle AA^{*}} is invertible. In this case, an explicit formula is:
{\displaystyle A^{+}=A^{*}(AA^{*})^{-1}\,\!}.
It follows that {\displaystyle A^{+}\,\!} is a right inverse of {\displaystyle A\,\!}: {\displaystyle AA^{+}=I_{m}\,\!}.
Orthonormal columns or rows[edit]
This is a special case of either full column rank or full row rank (treated above). If {\displaystyle A\,\!} has orthonormal columns ({\displaystyle A^{*}A=I_{n}\,\!}) or orthonormal rows ({\displaystyle AA^{*}=I_{m}\,\!}), then:
{\displaystyle A^{+}=A^{*}\,\!}.
Orthogonal projection matrices[edit]
If {\displaystyle A\,\!} is an orthogonal projection matrix, i.e. {\displaystyle A=A^{*}} and {\displaystyle A^{2}=A}, then the pseudoinverse trivially coincides with the matrix itself:
Circulant matrices[edit]
For a circulant matrix {\displaystyle C\,\!}, the singular value decomposition is given by the Fourier transform, that is the singular values are the Fourier coefficients. Let {\displaystyle {\mathcal {F}}} be the Discrete Fourier Transform (DFT) matrix, then[14]
{\displaystyle {\begin{aligned}C&={\mathcal {F}}\cdot \Sigma \cdot {\mathcal {F}}^{*}\\C^{+}&={\mathcal {F}}\cdot \Sigma ^{+}\cdot {\mathcal {F}}^{*}\end{aligned}}}
Construction[edit]
Rank decomposition[edit]
Let {\displaystyle r\leq \min(m,n)} denote the rank of {\displaystyle A\in \mathrm {M} (m,n;K)\,\!}. Then {\displaystyle A\,\!} can be (rank) decomposed as {\displaystyle A=BC\,\!} where {\displaystyle B\in \mathrm {M} (m,r;K)\,\!} and {\displaystyle C\in \mathrm {M} (r,n;K)\,\!} are of rank {\displaystyle r}. Then {\displaystyle A^{+}=C^{+}B^{+}=C^{*}(CC^{*})^{-1}(B^{*}B)^{-1}B^{*}\,\!}.
The QR method[edit]
For {\displaystyle K=\mathbb {R} \,\!} or {\displaystyle K=\mathbb {C} \,\!} computing the product {\displaystyle AA^{*}} or {\displaystyle A^{*}A} and their inverses explicitly is often a source of numerical rounding errors and computational cost in practice. An alternative approach using the QR decomposition of {\displaystyle A\,\!} may be used instead.
Consider the case when {\displaystyle A\,\!} is of full column rank, so that {\displaystyle A^{+}=(A^{*}A)^{-1}A^{*}\,\!}. Then the Cholesky decomposition {\displaystyle A^{*}A=R^{*}R\,\!}, where {\displaystyle R\,\!} is an upper triangular matrix, may be used. Multiplication by the inverse is then done easily by solving a system with multiple right-hand sides,
{\displaystyle A^{+}=(A^{*}A)^{-1}A^{*}\quad \Leftrightarrow \quad (A^{*}A)A^{+}=A^{*}\quad \Leftrightarrow \quad R^{*}RA^{+}=A^{*}}
which may be solved by forward substitution followed by back substitution.
The Cholesky decomposition may be computed without forming {\displaystyle A^{*}A\,\!} explicitly, by alternatively using the QR decomposition of {\displaystyle A=QR\,\!}, where {\displaystyle Q\,\,\!} has orthonormal columns, {\displaystyle Q^{*}Q=I}, and {\displaystyle R\,\!} is upper triangular. Then
{\displaystyle A^{*}A\,=\,(QR)^{*}(QR)\,=\,R^{*}Q^{*}QR\,=\,R^{*}R},
so R is the Cholesky factor of {\displaystyle A^{*}A}.
The case of full row rank is treated similarly by using the formula {\displaystyle A^{+}=A^{*}(AA^{*})^{-1}\,\!} and using a similar argument, swapping the roles of {\displaystyle A} and {\displaystyle A^{*}}.
Singular value decomposition (SVD)[edit]
A computationally simple and accurate way to compute the pseudoinverse is by using the singular value decomposition.[1][9][15] If {\displaystyle A=U\Sigma V^{*}} is the singular value decomposition of A, then {\displaystyle A^{+}=V\Sigma ^{+}U^{*}}. For a rectangular diagonal matrix such as {\displaystyle \Sigma }, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place, and then transposing the matrix. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the MATLAB, GNU Octave, or NumPy function pinv, the tolerance is taken to be t = ε⋅max(m,n)⋅max(Σ), where ε is the machine epsilon.
The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix multiplication, even if a state-of-the art implementation (such as that of LAPACK) is used.
The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrix A has a singular value 0 (a diagonal entry of the matrix {\displaystyle \Sigma } above), then modifying A slightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.
Block matrices[edit]
Optimized approaches exist for calculating the pseudoinverse of block structured matrices.
The iterative method of Ben-Israel and Cohen[edit]
Another method for computing the pseudoinverse (cf. Drazin inverse) uses the recursion
{\displaystyle A_{i+1}=2A_{i}-A_{i}AA_{i},\,}
which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of {\displaystyle A} if it is started with an appropriate {\displaystyle A_{0}} satisfying {\displaystyle A_{0}A=(A_{0}A)^{*}}. The choice {\displaystyle A_{0}=\alpha A^{*}} (where {\displaystyle 0<\alpha <2/\sigma _{1}^{2}(A)}, with {\displaystyle \sigma _{1}(A)} denoting the largest singular value of {\displaystyle A}) [16] has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before {\displaystyle A_{i}} enters the region of quadratic convergence.[17] However, if started with {\displaystyle A_{0}} already close to the Moore–Penrose inverse and {\displaystyle A_{0}A=(A_{0}A)^{*}}, for example {\displaystyle A_{0}:=(A^{*}A+\delta I)^{-1}A^{*}}, convergence is fast (quadratic).
Updating the pseudoinverse[edit]
For the cases where A has full row or column rank, and the inverse of the correlation matrix ({\displaystyle AA^{*}} for Awith full row rank or {\displaystyle A^{*}A} for full column rank) is already known, the pseudoinverse for matrices related to {\displaystyle A} can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms[18][19] exist that exploit the relationship.
Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.[20][21]
Software libraries[edit]
The Python package NumPy provides a pseudoinverse calculation through its functions matrix.I
and linalg.pinv
; its pinv
uses the SVD-based algorithm. SciPy adds a function scipy.linalg.pinv
that uses a least-squares solver. High quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computingor embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.
The MASS package for R provides a calculation of the Moore–Penrose inverse through the ginv
function.[22] The ginv
function calculates a pseudoinverse using the singular value decomposition provided by the svd
function in the base R package. An alternative is to employ the pinv
function available in the pracma package.
The Octave programming language provides a pseudoinverse through the standard package function pinv
as well as the pseudo_inverse()
method.
Applications[edit]
Linear least-squares[edit]
See also: Linear least squares (mathematics)
The pseudoinverse provides a least squares solution to a system of linear equations.[23] For {\displaystyle A\in \mathrm {M} (m,n;K)\,\!}, given a system of linear equations
in general, a vector {\displaystyle x} that solves the system may not exist, or if one does exist, it may not be unique. The pseudoinverse solves the "least-squares" problem as follows:
· {\displaystyle \forall x\in K^{n}\,\!}, we have {\displaystyle \|Ax-b\|_{2}\geq \|Az-b\|_{2}} where {\displaystyle z=A^{+}b} and {\displaystyle \|\cdot \|_{2}} denotes the Euclidean norm. This weak inequality holds with equality if and only if {\displaystyle x=A^{+}b+(I-A^{+}A)w} for any vector w; this provides an infinitude of minimizing solutions unless A has full column rank, in which case {\displaystyle (I-A^{+}A)} is a zero matrix.[24] The solution with minimum Euclidean norm is {\displaystyle z.}[24]
This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let {\displaystyle B\in \mathrm {M} (m,p;K)}.
· {\displaystyle \forall X\in \mathrm {M} (n,p;K)\,\!}, we have {\displaystyle \|AX-B\|_{\mathrm {F} }\geq \|AZ-B\|_{\mathrm {F} }} where {\displaystyle Z=A^{+}B} and {\displaystyle \|\cdot \|_{\mathrm {F} }} denotes the Frobenius norm.
Obtaining all solutions of a linear system[edit]
If the linear system
has any solutions, they are all given by[25]
{\displaystyle x=A^{+}b+[I-A^{+}A]w}
for arbitrary vector {\displaystyle w}. Solution(s) exist if and only if {\displaystyle AA^{+}b=b}.[25] If the latter holds, then the solution is unique if and only if A has full column rank, in which case {\displaystyle [I-A^{+}A]} is a zero matrix. If solutions exist but A does not have full column rank, then we have an indeterminate system, all of whose infinitude of solutions are given by this last equation. This solution is deeply connected to the Udwadia–Kalaba equationof classical mechanics to forces of constraint that do not obey D'Alembert's principle.
Minimum norm solution to a linear system[edit]
For linear systems {\displaystyle Ax=b,\,} with non-unique solutions (such as under-determined systems), the pseudoinverse may be used to construct the solution of minimum Euclidean norm {\displaystyle \|x\|_{2}} among all solutions.
· If {\displaystyle Ax=b\,} is satisfiable, the vector {\displaystyle z=A^{+}b} is a solution, and satisfies {\displaystyle \|z\|_{2}\leq \|x\|_{2}} for all solutions.
This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let {\displaystyle B\in \mathrm {M} (m,p;K)\,\!}.
· If {\displaystyle AX=B\,} is satisfiable, the matrix {\displaystyle Z=A^{+}B} is a solution, and satisfies {\displaystyle \|Z\|_{\mathrm {F} }\leq \|X\|_{\mathrm {F} }} for all solutions.
Condition number[edit]
Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix:
{\displaystyle {\mbox{cond}}(A)=\|A\|\|A^{+}\|.\ }
A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of A can lead to huge errors in the entries of the solution.[26]
Generalizations[edit]
In order to solve more general least-squares problems, one can define Moore–Penrose inverses for all continuous linear operators A : H1 → H2 between two Hilbert spaces H1 and H2, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudoinverse in this sense.[26] Those that do are precisely the ones whose range is closed in H2.
In abstract algebra, a Moore–Penrose inverse may be defined on a *-regular semigroup. This abstract definition coincides with the one in linear algebra.
See also[edit]
· Proofs involving the Moore–Penrose inverse
· Linear least squares (mathematics)
Notes[edit]
1. ^ Jump up to:a b c Ben-Israel & Greville 2003.
2. Jump up^ Ben-Israel & Greville 2003, p. 7.
3. Jump up^ Campbell & Meyer, Jr. 1991, p. 10.
4. Jump up^ Nakamura 1991, p. 42.
5. Jump up^ Rao & Mitra 1971, p. 50–51.
6. Jump up^ Moore, E. H. (1920). "On the reciprocal of the general algebraic matrix". Bulletin of the American Mathematical Society. 26 (9): 394–95. doi:10.1090/S0002-9904-1920-03322-7.
7. Jump up^ Bjerhammar, Arne (1951). "Application of calculus of matrices to method of least squares; with special references to geodetic calculations". Trans. Roy. Inst. Tech. Stockholm. 49.
8. ^ Jump up to:a b Penrose, Roger (1955). "A generalized inverse for matrices". Proceedings of the Cambridge Philosophical Society. 51: 406–13. doi:10.1017/S0305004100030401.
9. ^ Jump up to:a b c d e Golub, Gene H.; Charles F. Van Loan (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins. pp. 257–258. ISBN 0-8018-5414-8.
10. ^ Jump up to:a b c Stoer, Josef; Bulirsch, Roland (2002). Introduction to Numerical Analysis (3rd ed.). Berlin, New York: Springer-Verlag. ISBN 978-0-387-95452-3..
11. Jump up^ Maciejewski, Anthony A.; Klein, Charles A. (1985). "Obstacle Avoidance for Kinematically Redundant Manipulators in Dynamically Varying Environments". International Journal of Robotics Research.
12. Jump up^ Rakočević, Vladimir (1997). "On continuity of the Moore–Penrose and Drazin inverses" (PDF). Matematički Vesnik. 49: 163–72.
13. Jump up^ Golub, G. H.; Pereyra, V. (April 1973). "The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate". SIAM Journal on Numerical Analysis. 10 (2): 413–32. JSTOR 2156365.
14. Jump up^ Stallings, W. T.; Boullion, T. L. (1972). "The Pseudoinverse of an r-Circulant Matrix". Proceedings of the American Mathematical Society. 34: 385–88. doi:10.2307/2038377.
15. Jump up^ Linear Systems & Pseudo-Inverse
16. Jump up^ Ben-Israel, Adi; Cohen, Dan (1966). "On Iterative Computation of Generalized Inverses and Associated Projections". SIAM Journal on Numerical Analysis. 3: 410–19. doi:10.1137/0703035. JSTOR 2949637.pdf
17. Jump up^ Söderström, Torsten; Stewart, G. W. (1974). "On the Numerical Properties of an Iterative Method for Computing the Moore–Penrose Generalized Inverse". SIAM Journal on Numerical Analysis. 11: 61–74. doi:10.1137/0711008. JSTOR 2156431.
18. Jump up^ Tino Gramß (1992). "Worterkennung mit einem künstlichen neuronalen Netzwerk". Georg-August-Universität zu Göttingen.
19. Jump up^ , Mohammad Emtiyaz, "Updating Inverse of a Matrix When a Column is Added/Removed"[1]
20. Jump up^ Meyer, Jr., Carl D. (1973). "Generalized inverses and ranks of block matrices". SIAM J. Appl. Math. 25: 597–602.
21. Jump up^ Meyer, Jr., Carl D. (1973). "Generalized inversion of modified matrices". SIAM J. Appl. Math. 24: 315–23.
22. Jump up^ "R: Generalized Inverse of a Matrix".
23. Jump up^ Penrose, Roger (1956). "On best approximate solution of linear matrix equations". Proceedings of the Cambridge Philosophical Society. 52: 17–19. doi:10.1017/S0305004100030929.
24. ^ Jump up to:a b Planitz, M. (October 1979). "Inconsistent systems of linear equations". Mathematical Gazette. 63: 181–85.
25. ^ Jump up to:a b James, M. (June 1978). "The generalised inverse". Mathematical Gazette. 62: 109–14.
26. ^ Jump up to:a b Hagen, Roland; Roch, Steffen; Silbermann, Bernd (2001). "Section 2.1.2". C*-algebras and Numerical Analysis. CRC Press.
References[edit]
· Ben-Israel, Adi; Greville, Thomas N.E. (2003). Generalized inverses: Theory and applications (2nd ed.). New York, NY: Springer. ISBN 0-387-00293-6.
· Campbell, S. L.; Meyer, Jr., C. D. (1991). Generalized Inverses of Linear Transformations. Dover. ISBN 978-0-486-66693-8.
· Nakamura, Yoshihiko (1991). Advanced Robotics: Redundancy and Optimization. Addison-Wesley. ISBN 0201151987.
· Rao, C. Radhakrishna; Mitra, Sujit Kumar (1971). Generalized Inverse of Matrices and its Applications. New York: John Wiley & Sons. p. 240. ISBN 0-471-70821-6.
External links[edit]
· Interactive program & tutorial of Moore–Penrose Pseudoinverse
· "Moore–Penrose inverse". PlanetMath.
· Weisstein, Eric W. "Pseudoinverse". MathWorld.
· Weisstein, Eric W. "Moore–Penrose Inverse". MathWorld.
· The Moore–Penrose Pseudoinverse. A Tutorial Review of the Theory
· Online Moore-Penrose Inverse calculator
0 件のコメント:
コメントを投稿