The filematrix
package can be used for matrices of any
size. The most convenient way of working with small and moderately sized
matrices is to quickly save and load them via
fm.create.from.matrix
and fm.load
respectively.
However, the main purpose of the filematrix
package is
to allow users to work with matrices many times larger than the amount
of computer memory. Such matrices can only be accessed by parts.
Let us setup a sufficiently large matrix for code examples below:
The fastest way to read or write all elements of a filematrix is to work with columns sequentially, multiple columns at a time. It is much faster than accessing a filematrix by rows.
The three examples below illustrate how this can be done for such tasks as - filling filematrix with values, - calculating column sums, and - calculating row sums.
Let us fill in the matrix with random values 512 columns at a time.
step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
fr = (part-1)*step1 + 1
to = min(part*step1, runto)
message( "Filling in columns ", fr, " to ", to)
fm[,fr:to] = runif(nrow(fm) * (to-fr+1))
}
## Filling in columns 1 to 512
## Filling in columns 513 to 1024
## Filling in columns 1025 to 1536
## Filling in columns 1537 to 2048
## Filling in columns 2049 to 2560
## Filling in columns 2561 to 3072
## Filling in columns 3073 to 3584
## Filling in columns 3585 to 4096
## Filling in columns 4097 to 4608
## Filling in columns 4609 to 5120
## Filling in columns 5121 to 5632
## Filling in columns 5633 to 6144
## Filling in columns 6145 to 6656
## Filling in columns 6657 to 7168
## Filling in columns 7169 to 7680
## Filling in columns 7681 to 8192
## Filling in columns 8193 to 8704
## Filling in columns 8705 to 9216
## Filling in columns 9217 to 9728
## Filling in columns 9729 to 10000
Let us calculate column sums of the filematrix, 256 columns at a time.
fmcolsums = double(ncol(fm))
step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
fr = (part-1)*step1 + 1
to = min(part*step1, runto)
message("Calculating column sums, processing columns ", fr, " to ", to)
fmcolsums[fr:to] = colSums(fm[,fr:to])
}
## Calculating column sums, processing columns 1 to 512
## Calculating column sums, processing columns 513 to 1024
## Calculating column sums, processing columns 1025 to 1536
## Calculating column sums, processing columns 1537 to 2048
## Calculating column sums, processing columns 2049 to 2560
## Calculating column sums, processing columns 2561 to 3072
## Calculating column sums, processing columns 3073 to 3584
## Calculating column sums, processing columns 3585 to 4096
## Calculating column sums, processing columns 4097 to 4608
## Calculating column sums, processing columns 4609 to 5120
## Calculating column sums, processing columns 5121 to 5632
## Calculating column sums, processing columns 5633 to 6144
## Calculating column sums, processing columns 6145 to 6656
## Calculating column sums, processing columns 6657 to 7168
## Calculating column sums, processing columns 7169 to 7680
## Calculating column sums, processing columns 7681 to 8192
## Calculating column sums, processing columns 8193 to 8704
## Calculating column sums, processing columns 8705 to 9216
## Calculating column sums, processing columns 9217 to 9728
## Calculating column sums, processing columns 9729 to 10000
rm(part, step1, runto, nsteps, fr, to)
message("Sums of first and last columns are ",
fmcolsums[1], " and ", tail(fmcolsums,1))
## Sums of first and last columns are 2528.75866571884 and 2512.57311487338
Let us calculate column sums of the filematrix, 256 columns at a time.
fmrowsums = double(nrow(fm))
step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
fr = (part-1)*step1 + 1
to = min(part*step1, runto)
message("Calculating row sums, processing columns ", fr, " to ", to)
fmrowsums = fmrowsums + rowSums(fm[,fr:to])
}
## Calculating row sums, processing columns 1 to 512
## Calculating row sums, processing columns 513 to 1024
## Calculating row sums, processing columns 1025 to 1536
## Calculating row sums, processing columns 1537 to 2048
## Calculating row sums, processing columns 2049 to 2560
## Calculating row sums, processing columns 2561 to 3072
## Calculating row sums, processing columns 3073 to 3584
## Calculating row sums, processing columns 3585 to 4096
## Calculating row sums, processing columns 4097 to 4608
## Calculating row sums, processing columns 4609 to 5120
## Calculating row sums, processing columns 5121 to 5632
## Calculating row sums, processing columns 5633 to 6144
## Calculating row sums, processing columns 6145 to 6656
## Calculating row sums, processing columns 6657 to 7168
## Calculating row sums, processing columns 7169 to 7680
## Calculating row sums, processing columns 7681 to 8192
## Calculating row sums, processing columns 8193 to 8704
## Calculating row sums, processing columns 8705 to 9216
## Calculating row sums, processing columns 9217 to 9728
## Calculating row sums, processing columns 9729 to 10000
rm(part, step1, runto, nsteps, fr, to)
message("Sums of first and last rows are ",
fmrowsums[1], " and ", tail(fmrowsums,1))
## Sums of first and last rows are 4967.3736449331 and 4968.21027226094
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] filematrix_1.3 knitr_1.48
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.49
## [5] maketools_1.3.1 cachem_1.1.0 htmltools_0.5.8.1 rmarkdown_2.28
## [9] buildtools_1.0.0 lifecycle_1.0.4 cli_3.6.3 sass_0.4.9
## [13] jquerylib_0.1.4 compiler_4.4.1 sys_3.4.3 tools_4.4.1
## [17] evaluate_1.0.1 bslib_0.8.0 yaml_2.3.10 jsonlite_1.8.9
## [21] rlang_1.1.4