| Title: | Rapid Manipulation of the Variant Call Format (VCF) |
|---|---|
| Description: | The 'vcfpp.h' (<https://github.com/Zilong-Li/vcfpp>) provides an easy-to-use 'C++' 'API' of 'htslib', offering full functionality for manipulating Variant Call Format (VCF) files. The 'vcfppR' package serves as the R bindings of the 'vcfpp.h' library, enabling rapid processing of both compressed and uncompressed VCF files. Explore a range of powerful features for efficient VCF data manipulation. |
| Authors: | Zilong Li [aut, cre] (ORCID: <https://orcid.org/0000-0001-5859-2078>), Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng and Ohan, Valeriu and Whitwham, Andrew and Keane, Thomas and Davies, Robert M [cph] (Authors of included htslib library) |
| Maintainer: | Zilong Li <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.8.3 |
| Built: | 2026-06-08 11:04:53 UTC |
| Source: | https://github.com/zilong-li/vcfppr |
The 'vcfpp.h' (https://github.com/Zilong-Li/vcfpp) provides an easy-to-use 'C++' 'API' of 'htslib', offering full functionality for manipulating Variant Call Format (VCF) files. The 'vcfppR' package serves as the R bindings of the 'vcfpp.h' library, enabling rapid processing of both compressed and uncompressed VCF files. Explore a range of powerful features for efficient VCF data manipulation.
Maintainer: Zilong Li [email protected] (ORCID)
Other contributors:
Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng and Ohan, Valeriu and Whitwham, Andrew and Keane, Thomas and Davies, Robert M (Authors of included htslib library) [copyright holder]
Useful links:
Visualizes variant positions and alleles on both haplotypes for multiple samples. Each sample is represented by two horizontal tracks (one per haplotype), with variants colored according to their type (SNP, insertion, deletion) and allele (reference or alternate). Large gaps between variants can be automatically compressed for better visualization.
plot_variants_per_haplotype( vcffiles, region, types = c("SNP", "DEL", "INS"), shrink_threshold = 1000, xlab = "Genomic position", ylab = "Haplotypes of each sample", main = NULL, ... )plot_variants_per_haplotype( vcffiles, region, types = c("SNP", "DEL", "INS"), shrink_threshold = 1000, xlab = "Genomic position", ylab = "Haplotypes of each sample", main = NULL, ... )
vcffiles |
Character vector of VCF/BCF file paths or URLs. Each file represents one sample. |
region |
Character string specifying the genomic region to visualize (e.g., "chr1:1000-5000"). |
types |
Character vector of variant types to include in the plot. Valid options are "SNP" (single nucleotide polymorphisms), "DEL" (deletions), and "INS" (insertions). Default: c("SNP", "DEL", "INS"). |
shrink_threshold |
Numeric value specifying the minimum gap size (in base pairs) between variants that will trigger compression. Gaps larger than this threshold are shrunk to improve visualization density. Default: 1000. |
xlab |
Character string for the x-axis label. Default: "Genomic position". |
ylab |
Character string for the y-axis label. Default: "Haplotypes of each sample". |
main |
Character string for the plot title. Default: NULL (no title). |
... |
Additional graphical parameters passed to the base plot function. |
The function reads variant data from multiple VCF files using vcftable with
collapse=FALSE to preserve haplotype phasing information. Each sample is displayed
as two horizontal tracks representing the two haplotypes (h1 and h2).
Variant types are distinguished by color:
SNPs: Green (reference allele) or Yellow (alternate allele)
Deletions: Light blue (reference allele only)
Insertions: Dark blue (reference allele) or Dark orange (alternate allele)
When large gaps exist between variants (exceeding shrink_threshold), the function
compresses these regions and marks them with black dashed lines and "..." text to indicate
the compression. This feature helps visualize sparse variant distributions more effectively.
Invisibly returns NULL. The function is called for its side effect of creating a plot.
## Not run: # Plot variants from three samples in a specific region vcf_files <- c("sample1.vcf.gz", "sample2.vcf.gz", "sample3.vcf.gz") plot_variants_per_haplotype(vcf_files, region = "chr20:1000000-1100000") # Plot only SNPs and insertions with custom threshold plot_variants_per_haplotype(vcf_files, region = "chr20:1000000-1100000", types = c("SNP", "INS"), shrink_threshold = 5000) # Customize plot appearance plot_variants_per_haplotype(vcf_files, region = "chr20:1000000-1100000", main = "Variant Distribution", xlab = "Position (bp)", cex.axis = 0.8) ## End(Not run)## Not run: # Plot variants from three samples in a specific region vcf_files <- c("sample1.vcf.gz", "sample2.vcf.gz", "sample3.vcf.gz") plot_variants_per_haplotype(vcf_files, region = "chr20:1000000-1100000") # Plot only SNPs and insertions with custom threshold plot_variants_per_haplotype(vcf_files, region = "chr20:1000000-1100000", types = c("SNP", "INS"), shrink_threshold = 5000) # Customize plot appearance plot_variants_per_haplotype(vcf_files, region = "chr20:1000000-1100000", main = "Variant Distribution", xlab = "Position (bp)", cex.axis = 0.8) ## End(Not run)
S3 method for subsetting vcftable objects by rows (variants) and columns (fields). Allows filtering variants based on logical conditions and selecting specific fields.
## S3 method for class 'vcftable' subset(x, subset, select, drop = FALSE, ...)## S3 method for class 'vcftable' subset(x, subset, select, drop = FALSE, ...)
x |
a vcftable object returned by |
subset |
logical expression indicating variants (rows) to keep. The expression is evaluated in the context of the vcftable object, allowing direct reference to fields like chr, pos, ref, alt, qual, etc. Missing values are treated as FALSE. |
select |
expression indicating which fields (columns) to select. If omitted, all fields except samples are selected. Note: the samples field is always kept and cannot be selected/deselected. |
drop |
logical. If TRUE, the result is coerced to the lowest possible dimension. Passed to the [ operator when subsetting. Default FALSE. |
... |
Currently not used but can avoid S3 generic consistency warnings |
A vcftable object with the selected variants and fields.
Zilong Li [email protected]
library('vcfppR') vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") res <- vcftable(vcffile, "chr21:1-5050000") # Subset by quality score high_qual <- subset(res, qual > 100) # Subset by position and select specific fields region_subset <- subset(res, pos >= 5000000 & pos <= 5010000, select = c(chr, pos, ref, alt)) region_subset <- subset(res, pos >= 5000000 & pos <= 5030400, select = c(chr, pos, ref, alt)) # Subset SNPs (REF and ALT are single nucleotides) snps <- subset(res, nchar(ref) == 1 & nchar(alt) == 1)library('vcfppR') vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") res <- vcftable(vcffile, "chr21:1-5050000") # Subset by quality score high_qual <- subset(res, qual > 100) # Subset by position and select specific fields region_subset <- subset(res, pos >= 5000000 & pos <= 5010000, select = c(chr, pos, ref, alt)) region_subset <- subset(res, pos >= 5000000 & pos <= 5030400, select = c(chr, pos, ref, alt)) # Subset SNPs (REF and ALT are single nucleotides) snps <- subset(res, nchar(ref) == 1 & nchar(alt) == 1)
Compare two VCF/BCF files reporting various statistics
vcfcomp( test, truth, formats = c("DS", "GT"), stats = "r2", by.sample = FALSE, by.variant = FALSE, flip = FALSE, names = NULL, bins = NULL, af = NULL, out = NULL, choose_random_start = FALSE, return_pse_sites = FALSE, ... )vcfcomp( test, truth, formats = c("DS", "GT"), stats = "r2", by.sample = FALSE, by.variant = FALSE, flip = FALSE, names = NULL, bins = NULL, af = NULL, out = NULL, choose_random_start = FALSE, return_pse_sites = FALSE, ... )
test |
path to the comparison file (test), which can be a VCF/BCF file, vcftable object or saved RDS file. |
truth |
path to the baseline file (truth), which can be a VCF/BCF file, vcftable object or saved RDS file. |
formats |
character vector. the FORMAT tags to extract for the test and truth respectively. default c("DS", "GT") extracts 'DS' of the test and 'GT' of the truth. |
stats |
character. the statistics to be calculated. Supports the following options:
|
by.sample |
logical. calculate sample-wise concordance, which can be stratified by MAF bin. |
by.variant |
logical. calculate variant-wise concordance, which can be stratified by MAF bin. If both by.sample and by.variant are FALSE, then do calculations for all samples and variants together in a bin. |
flip |
logical. flip the ref and alt variants |
names |
character vector. reset samples' names in the test VCF. |
bins |
numeric vector. break statistics into allele frequency bins. If NULL (default), bins are automatically generated with fine resolution for rare variants and coarser resolution for common variants (ranging from 0 to 0.5). |
af |
file path with allele frequency or a RDS file with a saved object for af. Format of the text file: a space-separated text file with five columns and a header named 'chr' 'pos' 'ref' 'alt' 'af'. If NULL, allele frequencies are calculated from the truth genotypes. |
out |
output prefix for saving objects into RDS file. If provided, creates three files: out.af.rds, out.test.rds, and out.truth.rds |
choose_random_start |
logical. choose random start for stats="pse". Defaults to FALSE. |
return_pse_sites |
logical. return phasing switch error sites when stats="pse". Defaults to FALSE. |
... |
additional options passed to |
vcfcomp implements various statistics to compare two VCF/BCF files,
e.g. report genotype concordance, correlation stratified by allele frequency.
a list object of class "vcfcomp" containing:
character vector of sample names
the calculated statistics, named according to the 'stats' parameter. For stats="all", returns r2, f1, and nrc components.
Zilong Li [email protected]
library('vcfppR') # site-wise comparision stratified by allele frequency test <- system.file("extdata", "imputed.gt.vcf.gz", package="vcfppR") truth <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") samples <- "HG00673,NA10840" res <- vcfcomp(test, truth, stats="r2", bins=c(0,1), samples=samples, setid=TRUE) str(res) # sample-wise comparision stratified by sample-level metrice e.g GQ test <- system.file("extdata", "svupp.call.vcf.gz", package="vcfppR") truth <- system.file("extdata", "platinum.sv.vcf.gz", package="vcfppR") res <- vcfcomp(test, truth, stats = "gtgq", region = "chr1") str(res)library('vcfppR') # site-wise comparision stratified by allele frequency test <- system.file("extdata", "imputed.gt.vcf.gz", package="vcfppR") truth <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") samples <- "HG00673,NA10840" res <- vcfcomp(test, truth, stats="r2", bins=c(0,1), samples=samples, setid=TRUE) str(res) # sample-wise comparision stratified by sample-level metrice e.g GQ test <- system.file("extdata", "svupp.call.vcf.gz", package="vcfppR") truth <- system.file("extdata", "platinum.sv.vcf.gz", package="vcfppR") res <- vcfcomp(test, truth, stats = "gtgq", region = "chr1") str(res)
read a INFO tag in the VCF/BCF into R data structure
vcfinfo( vcffile, tag, region = "", vartype = "all", ids = NULL, qual = 0, pass = FALSE, setid = FALSE )vcfinfo( vcffile, tag, region = "", vartype = "all", ids = NULL, qual = 0, pass = FALSE, setid = FALSE )
vcffile |
path to the VCF/BCF file |
tag |
the INFO tag to extract. |
region |
region to subset in bcftools-like style: "chr1", "chr1:1-10000000" |
vartype |
restrict to specific type of variants. supports "snps","indels", "sv", "multisnps","multiallelics" |
ids |
character vector. restrict to sites with ID in the given vector. default NULL won't filter any sites. |
qual |
numeric. restrict to variants with QUAL > qual. |
pass |
logical. restrict to variants with FILTER = "PASS". |
setid |
logical. reset ID column as CHR_POS_REF_ALT. |
vcfinfo uses the C++ API of vcfpp, which is a wrapper of htslib, to read VCF/BCF files.
Thus, it has the full functionalities of htslib, such as restrict to specific variant types,
samples and regions. For the memory efficiency reason, the vcfinfo is designed
to parse only one tag at a time in the INFO column of the VCF. Currently it does not support
parsing a vector of values for a given INFO tag.
Return a list containing the following components:
: character vector;
the CHR column in the VCF file
: character vector;
the POS column in the VCF file
: character vector;
the ID column in the VCF file
: character vector;
the REF column in the VCF file
: character vector;
the ALT column in the VCF file
: character vector;
the QUAL column in the VCF file
: character vector;
the FILTER column in the VCF file
: vector of either integer, numberic or character values depending on the tag to extract;
a specifiy tag in the INFO column to be extracted
Zilong Li [email protected]
library('vcfppR') vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") res <- vcfinfo(vcffile, "AF", region = "chr21:1-5050000", vartype = "snps", pass = TRUE) str(res)library('vcfppR') vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") res <- vcfinfo(vcffile, "AF", region = "chr21:1-5050000", vartype = "snps", pass = TRUE) str(res)
Make sensible and beautiful plots based on various objects in vcfppR
vcfplot( obj, which.sample = NULL, which.format = 10, variant = c("SNP", "INDEL"), pop = NULL, ... )vcfplot( obj, which.sample = NULL, which.format = 10, variant = c("SNP", "INDEL"), pop = NULL, ... )
obj |
object returned by vcftable, vcfcomp, vcfsummary |
which.sample |
which sample to be plotted. NULL will aggregate all samples. |
which.format |
which FORMAT field to be plotted. Defaults will use the 10-th names. |
variant |
which types of variant are desired |
pop |
file contains population information |
... |
parameters passed to graphics |
count the heterozygous sites per sample in the VCF/BCF
vcfpopgen( vcffile, region = "", samples = "-", pass = FALSE, qual = 0, fun = "heterozygosity" )vcfpopgen( vcffile, region = "", samples = "-", pass = FALSE, qual = 0, fun = "heterozygosity" )
vcffile |
path to the VCF/BCF file |
region |
region to subset like bcftools |
samples |
samples to subset like bcftools |
pass |
restrict to variants with FILTER==PASS |
qual |
restrict to variants with QUAL > qual. |
fun |
which popgen function to run. available functions are "heterozygosity". |
vcfpopgen a list containing the following components:
: character vector;
the samples ids in the VCF file after subsetting
: integer vector;
the counts of heterozygous sites of each sample in the same order as samples
Zilong Li [email protected]
library('vcfppR') vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") res <- vcfpopgen(vcffile) str(res)library('vcfppR') vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") res <- vcfpopgen(vcffile) str(res)
Calculate INFO score from GP after genotype imputation
vcfpp_calc_info_persite(GP)vcfpp_calc_info_persite(GP)
GP |
vector of length a multiple of 3 |
Type the name of the class to see the details and methods
A C++ class with the following fields/methods for manipulating the VCF/BCF
newConstructor given a vcf file
Parameter: vcffile - The path of a vcf file
newConstructor given a vcf file and the region
Parameter: vcffile - The path of a vcf file
Parameter: region - The region to be constrained
newConstructor given a vcf file, the region and the samples
Parameter: vcffile - The path of a vcf file
Parameter: region - The region to be constrained
Parameter: samples - The samples to be constrained. Comma separated list of samples to include (or exclude with "^" prefix).
setRegiontry to set specific region to work with. will throw errors if no index or region found. Use getStatus to check if the region is valid or empty!
getStatusreturn 1: region is valid and not empty. 0: region is valid but empty. -1: no index file. -2: region not found or invalid region form
variantTry to get next variant record. return FALSE if there are no more variants or hit the end of file, otherwise TRUE.
chrReturn the CHROM field of current variant
posReturn the POS field of current variant
idReturn the CHROM field of current variant
refReturn the REF field of current variant
altReturn the ALT field of current variant
qualReturn the QUAL field of current variant
filterReturn the FILTER field of current variant
infoReturn the INFO field of current variant
infoIntReturn the tag value of integer type in INFO field of current variant
Parameter: tag - The tag name to retrieve in INFO
infoFloatReturn the tag value of float type in INFO field of current variant
Parameter: tag - The tag name to retrieve in INFO
infoStrReturn the tag value of string type in INFO field of current variant
Parameter: tag - The tag name to retrieve in INFO
infoIntVecReturn the tag value in a vector of integer type in INFO field of current variant
Parameter: tag - The tag name to retrieve in INFO
infoFloatVecReturn the tag value in a vector of float type in INFO field of current variant
Parameter: tag - The tag name to retrieve in INFO
genotypesReturn the genotype values in a vector of integers
Parameter: collapse - Boolean value indicates wheather to collapse the size of genotypes, eg, return diploid genotypes.
formatIntReturn the tag value of integer type for each sample in FORAMT field of current variant
Parameter: tag - The tag name to retrieve in FORAMT
formatFloatReturn the tag value of float type for each sample in FORAMT field of current variant
Parameter: tag - The tag name to retrieve in FORAMT
formatStrReturn the tag value of string type for each sample in FORAMT field of current variant
Parameter: tag - The tag name to retrieve in FORAMT
isSNPTest if current variant is exculsively a SNP or not
isIndelTest if current variant is exculsively a INDEL or not
isSVTest if current variant is exculsively a SV or not
isMultiAllelicsTest if current variant is exculsively a Multi Allelics or not
isMultiAllelicSNPTest if current variant is exculsively a Multi Biallelics (SNPs) or not
hasSNPTest if current variant has a SNP or not
hasINDELTest if current variant has a INDEL or not
hasINSTest if current variant has a INS or not
hasDELTest if current variant has a DEL or not
hasMNPTest if current variant has a MNP or not
hasBNDTest if current variant has a BND or not
hasOTHERTest if current variant has a OTHER or not
hasOVERLAPTest if current variant has a OVERLAP or not
nsamplesReturn the number of samples
samplesReturn a vector of samples id
headerReturn the raw string of the vcf header
stringReturn the raw string of current variant including newline
lineReturn the raw string of current variant without newline
outputInit an output object for streaming out the variants to another vcf
updateSamplesupdate samples name in the output VCF
Parameter: s - A comma-seperated string for new samples names
writeStreaming out current variant the output vcf
closeClose the connection to the output vcf
setCHRModify the CHR of current variant
Parameter: s - A string for CHR
setIDModify the ID of current variant
Parameter: s - A string for ID
setPOSModify the POS of current variant
Parameter: pos - An integer for POS
setRefAltModify the REF and ALT of current variant
Parameter: s - A string reperated by comma
setInfoIntModify the given tag of INT type in the INFO of current variant
Parameter: tag - A string for the tag name
Parameter: v - An integer for the tag value
setInfoFloatModify the given tag of FLOAT type in the INFO of current variant
Parameter: tag - A string for the tag name
Parameter: v - A double for the tag value
setInfoStrModify the given tag of STRING type in the INFO of current variant
Parameter: tag - A string for the tag name
Parameter: s - A string for the tag value
setPhasingModify the phasing status of each sample
Parameter: v - An integer vector with size of the number of samples. only 1s and 0s are valid.
setGenotypesModify the genotypes of current variant
Parameter: v - An integer vector for genotypes. Use NA or -9 for missing value.
setFormatIntModify the given tag of INT type in the FORMAT of current variant
Parameter: tag - A string for the tag name
Parameter: v - An integer for the tag value
setFormatFloatModify the given tag of FLOAT type in the FORMAT of current variant
Parameter: tag - A string for the tag name
Parameter: v - A double for the tag value
setFormatStrModify the given tag of STRING type in the FORMAT of current variant
Parameter: tag - A string for the tag name
Parameter: s - A string for the tag value
rmInfoTagRemove the given tag from the INFO of current variant
Parameter: s - A string for the tag name
clearInfoRemove all INFO tags from the current variant, making INFO column empty
rmFormatTagRemove the given tag from the FORMAT of current variant
Parameter: s - A string for the tag name
setVariantModify current variant by adding a vcf line
Parameter: s - A string for one line in the VCF
addINFOAdd a INFO in the header of the vcf
Parameter: id - A string for the tag name
Parameter: number - A string for the number
Parameter: type - A string for the type
Parameter: desc - A string for description of what it means
addFORMATAdd a FORMAT in the header of the vcf
Parameter: id - A string for the tag name
Parameter: number - A string for the number
Parameter: type - A string for the type
Parameter: desc - A string for description of what it means
vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") br <- vcfreader$new(vcffile) res <- rep(0L, br$nsamples()) while(br$variant()) { if(br$isSNP()) { gt <- br$genotypes(TRUE) == 1 gt[is.na(gt)] <- FALSE res <- res + gt } }vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") br <- vcfreader$new(vcffile) res <- rep(0L, br$nsamples()) while(br$variant()) { if(br$isSNP()) { gt <- br$genotypes(TRUE) == 1 gt[is.na(gt)] <- FALSE res <- res + gt } }
summarize the various variant types at both variant level and sample level.
vcfsummary( vcffile, region = "", samples = "-", pass = FALSE, qual = 0, svtype = FALSE )vcfsummary( vcffile, region = "", samples = "-", pass = FALSE, qual = 0, svtype = FALSE )
vcffile |
path to the VCF/BCF file |
region |
region to subset like bcftools |
samples |
samples to subset like bcftools |
pass |
restrict to variants with FILTER==PASS |
qual |
restrict to variants with QUAL > qual. |
svtype |
summarize the variants with SVTYPE |
bcftools view -s "id01,id02" input.bcf.gz chr1:100000-20000
vcfsummary a list containing the following components:
: named integer vector;
summarize the counts of each variant type
: character vector;
the samples ids in the VCF file after subsetting
: integer vector;
the counts of the variant type at sample level in the same order as samples
Zilong Li [email protected]
library('vcfppR') svfile <- system.file("extdata", "sv.vcf.gz", package="vcfppR") res <- vcfsummary(svfile, region = "chr21:1-10000000", svtype = TRUE) str(res)library('vcfppR') svfile <- system.file("extdata", "sv.vcf.gz", package="vcfppR") res <- vcfsummary(svfile, region = "chr21:1-10000000", svtype = TRUE) str(res)
The swiss army knife for reading VCF/BCF into R data types rapidly and easily.
vcftable( vcffile, region = "", samples = "-", vartype = "all", format = "GT", ids = NULL, qual = 0, pass = FALSE, info = TRUE, collapse = TRUE, setid = FALSE, mac = 0, rmdup = FALSE )vcftable( vcffile, region = "", samples = "-", vartype = "all", format = "GT", ids = NULL, qual = 0, pass = FALSE, info = TRUE, collapse = TRUE, setid = FALSE, mac = 0, rmdup = FALSE )
vcffile |
path to the VCF/BCF file |
region |
region to subset in bcftools-like style: "chr1", "chr1:1-10000000" |
samples |
samples to subset in bcftools-like style. comma separated list of samples to include (or exclude with "^" prefix). e.g. "id01,id02", "^id01,id02". |
vartype |
restrict to specific type of variants. supports "snps","indels", "sv", "multisnps","multiallelics" |
format |
the FORMAT tag to extract. default "GT" is extracted. |
ids |
character vector. restrict to sites with ID in the given vector. default NULL won't filter any sites. |
qual |
numeric. restrict to variants with QUAL > qual. |
pass |
logical. restrict to variants with FILTER = "PASS". |
info |
logical. drop INFO column in the returned list. |
collapse |
logical. It acts on the FORMAT. If the FORMAT to extract is "GT", the dim of raw genotypes matrix of diploid is (M, 2 * N), where M is #markers and N is #samples. default TRUE will collapse the genotypes for each sample such that the matrix is (M, N). Set this to FALSE if one wants to maintain the phasing order, e.g. "1|0" is parsed as c(1, 0) with collapse=FALSE. If the FORMAT to extract is not "GT", then with collapse=TRUE it will try to turn a list of the extracted vector into a matrix. However, this raises issues when one variant is mutliallelic resulting in more vaules than others. |
setid |
logical. reset ID column as CHR_POS_REF_ALT. |
mac |
integer. restrict to variants with minor allele count higher than the value. |
rmdup |
logical. remove duplicated sites by keeping the first occurrence of POS. (default: FALSE) |
vcftable uses the C++ API of vcfpp, which is a wrapper of htslib, to read VCF/BCF files.
Thus, it has the full functionalities of htslib, such as restrict to specific variant types,
samples and regions. For the memory efficiency reason, the vcftable is designed
to parse only one tag at a time in the FORMAT column of the VCF. In default, only the matrix of genotypes,
i.e. "GT" tag, are returned by vcftable, but there are many other tags supported by the format option.
Return a list containing the following components:
: character vector;
the samples ids in the VCF file after subsetting
: character vector;
the CHR column in the VCF file
: character vector;
the POS column in the VCF file
: character vector;
the ID column in the VCF file
: character vector;
the REF column in the VCF file
: character vector;
the ALT column in the VCF file
: character vector;
the QUAL column in the VCF file
: character vector;
the FILTER column in the VCF file
: character vector;
the INFO column in the VCF file
: matrix of either integer or numberic values depending on the tag to extract;
a specifiy tag in the FORMAT column to be extracted
Zilong Li [email protected]
library('vcfppR') vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") res <- vcftable(vcffile, "chr21:1-5050000", vartype = "snps") str(res)library('vcfppR') vcffile <- system.file("extdata", "raw.gt.vcf.gz", package="vcfppR") res <- vcftable(vcffile, "chr21:1-5050000", vartype = "snps") str(res)
Type the name of the class to see the details and methods
A C++ class with the following fields/methods for writing the VCF/BCF
newConstructor given a vcf file
Parameter: vcffile - The path of a vcf file. don't start with "~"
Parameter: version - The version of VCF specification
addContigAdd a Contig in the header of the vcf
Parameter: str - A string for the CONTIG name
addFILTERAdd a FILTER in the header of the vcf
Parameter: id - A string for the FILTER name
Parameter: desc - A string for description of what it means
addINFOAdd a INFO in the header of the vcf
Parameter: id - A string for the tag name
Parameter: number - A string for the number
Parameter: type - A string for the type
Parameter: desc - A string for description of what it means
addFORMATAdd a FORMAT in the header of the vcf
Parameter: id - A string for the tag name
Parameter: number - A string for the number
Parameter: type - A string for the type
Parameter: desc - A string for description of what it means
addSampleAdd a SAMPLE in the header of the vcf
Parameter: str - A string for a SAMPLE name
addLineAdd a line in the header of the vcf
Parameter: str - A string for a line in the header of VCF
writelineWrite a variant record given a line
Parameter: line - A string for a line in the variant of VCF. Not ended with "newline"
closeClose and save the vcf file
outvcf <- file.path(paste0(tempfile(), ".vcf.gz")) bw <- vcfwriter$new(outvcf, "VCF4.1") bw$addContig("chr20") bw$addFORMAT("GT", "1", "String", "Genotype"); bw$addSample("NA12878") s1 <- "chr20\t2006060\t.\tG\tC\t100\tPASS\t.\tGT\t1|0" bw$writeline(s1) bw$close()outvcf <- file.path(paste0(tempfile(), ".vcf.gz")) bw <- vcfwriter$new(outvcf, "VCF4.1") bw$addContig("chr20") bw$addFORMAT("GT", "1", "String", "Genotype"); bw$addSample("NA12878") s1 <- "chr20\t2006060\t.\tG\tC\t100\tPASS\t.\tGT\t1|0" bw$writeline(s1) bw$close()