--- title: "Filematrix Overview" author: "Andrey A Shabalin" output: html_document: theme: readable toc: true # table of content true vignette: > %\VignetteIndexEntry{1 Filematrix overview} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- # Introduction The `filematrix` package provides functions to create and access large matrices stored in files, not computer memory. Filematrices can be as large as the storage allows. The package has been tested on matrices multiple terabytes in size. # Creating a filematrix File matrices can be created using functions `fm.create`, `fm.create.from.matrix`, or `fm.create.from.text.file`: ```{r setup, echo=FALSE} # setwd("D:/") library(knitr) # opts_knit$set(root.dir=tempdir()) ``` ```{r message=FALSE} library(filematrix) fm = fm.create(filenamebase = "fmat", nrow = 200, ncol = 200, type = "double") ``` The code above creates two files: `fmatrix.bmat` which stores the filematrix values, and `fmatrix.desc.txt` which stores the filematrix description, such as dimensions, data type, and data type size. Here is the content of the description file: ```{r comment=""} cat(readLines("fmat.desc.txt"), sep = "\n") ``` # Reading from and writing to a filematrix The elements of a filematrix can be read and written to using the same syntax as is used for regular R matrices. Any changes to a filematrix are written to the `.bmat` file without extra buffering. ```{r} fm[1:3, 1:2] = 1:6 fm[1:4, 1:3] colSums(fm[,1:4]) ``` Elements of a filematrix can also be accessed as elements of a vector (in which elements proceed sequentially down columns stacked 1:n). Thus, as `fm` has `nrow(fm)` rows, `fm[1,2]` accesses the same element as `fm[nrow(fm)+1]`. ```{r} fm[1:4] fm[nrow(fm)+1:4] ``` # Row and column names File matrices can also have row and column names, like regular R matrices. ```{r} colnames(fm) = paste0("Col", 1:ncol(fm)) rownames(fm) = paste0("Row", 1:nrow(fm)) ``` The row and column names of the filematrix `fm` are stored in `fmatrix.nmsrow.txt` and `fmatrix.nmscol.txt` respectively. # Closing filematrices An open filematrix object can be closed with `close` function. This closes the internal file handle (connection). Closing filematrix objects is **optional**, changes **would not be lost** if the object is not closed. ```{r} close(fm) ``` # Open or load an existing filematrix An existing filematrix can be opened with `fm.open`. ```{r} fm = fm.open(filenamebase = "fmat", readonly = FALSE) ``` To prevent any changes to the values of the filematrix set `readonly = TRUE`. An existing filematrix that would fit in memory can be loaded fully with `fm.load` ```{r} mat = fm.load("fmat") ``` # By column access is faster then by rows The values of a filematrix are stored by columns, as with regular R matrices: ```{r} matrix(1:12, nrow = 3, ncol = 4) ``` Thus, access to a filematrix values by columns is much faster than access by rows: ```{r row-col-timing} timerow = system.time( { sum(fm[1:10, ]) } )[3] timecol = system.time( { sum(fm[ ,1:10]) } )[3] cat("Reading ", nrow(fm)*10, " values from 10 columns takes ", timecol, " seconds", "\n", "Reading ", ncol(fm)*10, " values from 10 rows takes ", timerow, " seconds", "\n", sep = "") ``` The performance difference may not be observed on small matrices, as in this example. Change the size from `r nrow(fm)` x `r ncol(fm)` to 10,000 x 10,000 to see the difference (it is at least **hundred fold**). # Appending columns Unlike with regular R matrices, columns can be appended to the right side of a filematrix with very little computational overhead. ```{r} dim(fm) fm$appendColumns(nrow(fm):1) fm$appendColumns(1:nrow(fm)) dim(fm) fm[nrow(fm)+(-1:0), ncol(fm)+(-1:0)] ``` # Deleting filematrix files If you no longer need a filematrix and want to delete its files from the hard drive and close the object, please use `closeAndDeleteFiles()` ```{r} closeAndDeleteFiles(fm) ``` # Version information ```{r version} sessionInfo() ```