Title: | Fast and Accurate Randomized Singular Value Decomposition Algorithms with 'PCAone' |
---|---|
Description: | Fast and Accurate Randomized Singular Value Decomposition (RSVD) methods proposed in the 'PCAone' paper by Li (2023) <https://genome.cshlp.org/content/33/9/1599>. |
Authors: | Zilong Li [aut, cre] |
Maintainer: | Zilong Li <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.0 |
Built: | 2025-01-23 05:10:28 UTC |
Source: | https://github.com/zilong-li/pcaoner |
The Randomized Singular Value Decomposition (RSVD) computes the near-optimal low-rank approximation of a rectangular matrix using a fast probablistic algorithm.
pcaone( A, k = NULL, p = 7, q = 10, sdist = "normal", method = "alg2", windows = 64, shuffle = FALSE )
pcaone( A, k = NULL, p = 7, q = 10, sdist = "normal", method = "alg2", windows = 64, shuffle = FALSE )
A |
array_like; |
k |
integer; |
p |
integer, optional; |
q |
integer, optional; |
sdist |
string |
method |
string |
windows |
integer, optional; |
shuffle |
logical, optional; |
The singular value decomposition (SVD) plays an important role in data analysis, and scientific computing.
Given a rectangular matrix
, and a target rank
,
the SVD factors the input matrix
as
The left singular vectors are the columns of the
real or complex unitary matrix
. The
right singular vectors are the columns
of the real or complex unitary matrix
. The
dominant singular values are the
entries of
, and non-negative and real numbers.
is an oversampling parameter to improve the approximation.
A value of at least 10 is recommended, and
is set by default.
The parameter specifies the number of power (subspace) iterations
to reduce the approximation error. The power scheme is recommended,
especially when the singular values decay slowly. However, computing power iterations increases the
computational costs. Even though most RSVD implementations recommend
power iterations by default,
it's always sufficient to run only few power iterations where our window-based power iterations (
)
come to play. We recommend using
and
for pcaone algorithm2. As it is designed for large dataset,
we recommend using
when
.
If , a deterministic partial or truncated
svd
algorithm might be faster.
pcaone
returns a list containing the following three components:
array_like;
singular values; vector of length .
array_like;
left singular vectors; or
dimensional array.
array_like;
right singular vectors; or
dimensional array.
The singular vectors are not unique and only defined up to sign. If a left singular vector has its sign changed, changing the sign of the corresponding right vector gives an equivalent decomposition.
Zilong Li [email protected]
Z. Li, J Meisner, A Albrechtsen. "PCAone: fast and accurate out-of-core PCA framework for large scale biobank data" (2022) doi:10.1101/2022.05.25.493261.
library('pcaone') mat <- matrix(rnorm(100*20000), 100, 20000) res <- pcaone(mat, k = 10, p = 7, method = "alg2") str(res) res <- pcaone(mat, k = 10, p = 7, method = "alg1") str(res)
library('pcaone') mat <- matrix(rnorm(100*20000), 100, 20000) res <- pcaone(mat, k = 10, p = 7, method = "alg2") str(res) res <- pcaone(mat, k = 10, p = 7, method = "alg1") str(res)