Skip to content

ssivilich/jvarkit

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JVARKIT

Java utilities for Bioinformatics

Build Status

Warning

Since 2015-12-10, I'm slowly moving to a XML-based description of my tools and I'm now using java8 . I do my best to change all those tools. The documentation in the wiki migh be out of date. If you think you have found an error leave a message at https://github.com/lindenb/jvarkit/issues . Furthermore, I'm now using the "Apache Commons CLI library" for parsing the command line. For the arguments takings as input more than one parameter, you might have to add a double dash '--' to separate with the input files.

Author

Pierre Lindenbaum PhD

http://plindenbaum.blogspot.com

@yokofakun

Cite

see Cite

Download and install

See Download and Install

##Tools

ToolDescription
SplitBamSplit a BAM by chromosome group. Creates EMPTY bams if no reads was found for a given group.
SamJSFiltering a SAM/BAM with javascript (rhino).
VCFFilterJSFiltering a VCF with javascript (rhino)
SortVCFOnRefSort a VCF using the order of the chromosomes in a REFerence index.
IlluminadirCreate a structured (**JSON** or **XML**) representation of a directory containing some Illumina FASTQs.
BamStats04Coverage statistics for a BED file. It uses the Cigar string instead of the start/end to compute the coverage
BamStats05same as BamStats04 but group by gene
BamStats01Statistics about the reads in a BAM.
VCFBedAnnotate a VCF with the content of a BED file indexed with tabix.
VCFPolyXNumber of repeated REF bases around POS.
VCFBigWigAnnotate a VCF with the data of a bigwig file.
VCFTabixmlAnnotate a value from a vcf+xml file.4th column of the BED indexed with TABIX is a XML string.
GroupByGeneGroup VCF data by gene/transcript.
VCFPredictionsBasic variant prediction using UCSC knownGenes.
FindCorruptedFilesReads filename from stdin and prints corrupted NGS files (VCF/BAM/FASTQ).
VCF2XMLTransforms a VCF to XML.
VCFAnnoBamAnnotate a VCF with the Coverage statistics of a BAM file + BED file of capture. It uses the Cigar string instead of the start/end to get the voverage
VCFTrioCheck for mendelian incompatibilities in a VCF.
SamGrepSearch reads in a BAM
VCFFixIndelsFix samtools INDELS for @SolenaLS
NgsFilesSummaryScan folders and generate a summary of the files (SAMPLE/BAM SAMPLE/VCF etc..).
NoZeroVariationVCFcreates a VCF containing one fake variation if the input is empty.
HowManyBamDictfor @abinouze : quickly find the number of distinct BAM Dictionaries from a set of BAM files.
ExtendBedExtends a BED file by 'X' bases.
CmpBamsCompare two or more BAMs.
IlluminaFastqStatsStatistics on Illumina Fastqs
Bam2RasterSave a BAM alignment as a PNG image.
VcfRebaseFinds restriction sites overlapping variants in a VCF file
FastqRevCompReverse complement a FASTQ file for mate-pair alignment
PicardMetricsToXMLConvert picards metrics file to XML.
Bam2WigBam to Wiggle converter
TViewWebCGI/Web based version of samtools tview
VcfRegistryWebCGI/Web tool printing all variants at a given position for a collection VCF
BlastMapAnnotsMaps uniprot/genbank annotations on a blast result. See http://www.biostars.org/p/76056
VcfViewGuiSimple java-Swing-based VCF viewer.
BamViewGuiSimple java-Swing-based BAM viewer.
Biostar81455Defining precisely the genomic context based on a position http://www.biostars.org/p/81455/
MapUniProtFeaturesmap Uniprot features on reference genome.
Biostar86363Set genotype of specific sample/genotype comb to unknown in multisample vcf file.
FixVCFFix a VCF HEADER when I forgot to declare a FILTER or an INFO field in the HEADER
Biostar78400Add the read group info to the sam file on a per lane basis
Biostar78285Extract regions of genome that have 0 coverage See http://www.biostars.org/p/78285/
Biostar77288Low resolution sequence alignment visualization http://www.biostars.org/p/77288/
Biostar77828Divide the human genome among X cores, taking into account gaps See http://www.biostars.org/p/77828/
Biostar76892Fix strand of two paired reads close but on the same strand http://www.biostars.org/p/76892/
VCFCompareGTVCF : compare genotypes of two or more callers for the same samples.
SAM4WebLogoCreates an Input file for BAM + WebLogo.
SAM2TsvTabular view of each base of the reads vs the reference.
Biostar84786Table transposition
VCF2SQLGenerate the SQL code to insert a VCF into a database
VCFStripAnnotationsRemoves one or more field from the INFO column from a VCF.
VCFGeneOntologyFinds and filters the GO terms for VCF annotated with SNPEFF or VEP
Biostar86480Genomic restriction finder See http://www.biostars.org/p/86480/
BamToFastqShrink your FASTQ.bz2 files by 40+% using this one weird tip by ordering them by alignment to reference
PadEmptyFastqPad empty fastq sequence/qual with N/#
SamFixCigarReplace 'M'(match) in SAM cigar by 'X' or '='
FixVcfFormatFix PL format in VCF. Problem is described in http://gatkforums.broadinstitute.org/discussion/3453
VcfToRdfConvert a VCF to RDF.
VcfShuffleShuffle a VCF.
DownSampleVcfDown sample a VCF.
VcfHeadPrint the first variants of a VCF.
VcfTailPrint the last variants of a VCF
VcfCutSamplesSelect/Exclude some samples from a VCF
VcfStatsGenerate some statistics from a VCF
VcfSampleRenameRename Samples in a VCF.
VcffilterSequenceOntologyFilter a VCF on Seqence Ontology (SO).
Biostar59647position of mismatches per read from a sam/bam file (XML) See http://www.biostars.org/p/59647/
VcfRenameChromosomesRename chromosomes in a VCF (eg. convert hg19/ucsc to grch37/ensembl)
BamRenameChromosomesRename chromosomes in a BAM (eg. convert hg19/ucsc to grch37/ensembl)
BedRenameChromosomesRename chromosomes in a BED (eg. convert hg19/ucsc to grch37/ensembl)
BlastnToSnpMap variations from a BLASTN-XML file.
Blast2SamConvert a BLASTN-XML input to SAM
VcfMapUniprotMap uniprot features on VCF annotated with VEP or SNPEff.
VcfCompareCompare two VCF files.
VcfBiomartAnnotate a VCF with the data from Biomart.
VcfLiftOverLiftOver a VCF file.
BedLiftOverLiftOver a BED file.
VcfConcatConcatenate VCF files.
MergeSplittedBlastMerge Blast hit from a splitted database
FindMyVirusVirus+host cell : split BAM into categories.
Biostar90204linux split equivalent for BAM file .
VcfJasparFinds JASPAR profiles in VCF
GenomicJasparFinds JASPAR profiles in Fasta
VcfTreePackCreate a TreeMap from one or more VCF
BamTreePackCreate a TreeMap from one or more Bam.
FastqRecordTreePackCreate a TreeMap from one or more Fastq files.
WorldMapGenomeMap bed file to Genome + geographic data.
AddLinearIndexToBedUse a Sequence dictionary to create a linear index for a BED file. Can be used as a X-Axis for a chart.
VCFCommCompare mulitple VCF files, ouput a new VCF file.
VcfInPrints variants that are contained/not contained into another VCF
Biostar92368Binary interactions depth See also http://www.biostars.org/p/92368
VCFStopCodonTODO
FastqGrepFinds reads in fastq files
VcfCaddAnnotate a VCF with Combined Annotation Dependent Depletion (CADD) data.
SortVCFOnInfosort a VCF using a field in the INFO column
SamChangeReferenceTODO
SamExtractClipTODO
GCAndDepthExtracts GC% and depth for multiple bam using a sliding window.
Biostar94573Getting a VCF file from a CLUSTAW or FASTA alignment
CompareBamAndBuildCompare two BAM files mapped on two different builds. Requires a liftover chain file.
KnownGenesToBedConvert UCSC KnownGene to BED.
Biostar95652Drawing a schematic genomic context tree. See also http://www.biostars.org/p/95652/
SamToPslConvert SAM/BAM to PSL or BED12 .
BWAMemNOpmerge the SA:Z:* attributes of a read mapped with bwa-mem and prints a read containing a cigar string with 'N' (Skipped region from the REF).
FastqEntropyCompute the Entropy of a Fastq file (distribution of the length(gzipped(sequence)))
NgsFilesScannerBuild a persistent database of NGS file. Dump as XML.
SigFrameGUI displaying CGH data
Biostar103303Calculate Percent Spliced In (PSI)
VCFComparePredictionsCompare the variant predictions of VCFs
BackLocateMap a position in a protein back to the genomic coordinates.
FindAVariationSearch for variations in a set of VCF files.
AlleleFrequencyCalculatorVCF: Alelle Frequency Calculator
BuildWikipediaOntologyBuild a simple RDFS/XML ontology from Wikipedia Categories.
AlmostSortedVcfSort an 'almost' sorted VCF using an in-memory buffer.
Biostar105754bigwig: peak distance from specific genomic BED region
VcfRegulomeDBAnnotate a VCF with the RegulomeDB data (http://regulome.stanford.edu/)
Biostar106668unmark duplicates (deprecated)
BatchIGVPicturesGUI: Batch pictures with IGV
PubmedDumpDump pubmed data as XML.
BamIndexReadNamesBuild a dictionary of read names to be searched with BamQueryReadNames.
BamQueryReadNamesQuery a Bam file indexed with BamIndexReadNames.
FastqShuffleShuffle Fastq files.
FastqSplitInterleavedSplit interleaved Fastq files
PubmedFilterJSFilters pubmed XML using javascript.
ReferenceToVCFCreates a VCF containing all possible substitutions in a Reference Genome..
VcfEnsemblRegAnnotate a VCF with the UCSC genome hub tracks for Ensembl Regulation.
FastqJSFilters a FASTQ file using javascript.
Bam2SVGConvert a BAM to SVG
LiftOverToSVGConvert UCSC LiftOver chain files to animated SVG
VCFMergeCombines VCF files.
FixVcfMissingGenotypesUse BAM to fill missing genotypes in merged VCFs
NcbiTaxonomyToXml Dump NCBI taxonomy tree as a hierarchical XML document
BamCmpCoverage Creates the figure of a comparative view of the depths sample vs sample
FindAllCoveragesAtPositionFind depth at specific position in a list of BAM files
VcfMultiToOneConvert VCF with multiple samples to a VCF with one SAMPLE
Evs2XmlDownload data from Exome Variant Server as XML.
VcfRemoveGenotypeIfInVcfReset Genotypes in VCF if they've been found in another VCF indexed with tabix
Biostar130456Generate one VCF file for each sample from a multi-samples VCF
UniprotFilterJSFilter Uniprot XML with a javascript expression.
SkipXmlElementsFilter XML elements with a javascript expression.
MiniCallerSimple and Stupid Variant Caller designed for @AdrienLeger2
VcfCompareCallersOneSampleFor my colleague Julien. Compare VCF allers with VCF with one sample.
SamRetrieveSeqAndQual Is there a tool to add seq and qual to BAM? for @sjackman
VcfEnsemblVepRestAnnotate a VCF with Ensembl REST API.
VcfCompareCallersCompare two VCFs and print common/exclusive information for each sample/genotype
BamStats02Generate and explore statistics about the reads in a BAM (Sample/File/Flags/chroms/MAPQ)
BamTileBam tiling Path.
XContaminationsfor @AdrienLeger2 : test for cross contamination between samples in same flowcell/runlane.
VCFJoinVcfJSJoin two VCF files using javascript.
Biostar139647Convert Clustal/Fasta alignment to SAM/BAM
BioAlcidaeReformat bioinformatics files using javascript/rhino (~ awk)
VCFBedSetFilterSet FILTER for VCF having intersection with BED
VCFReplaceTagReplace the key in INFO/FORMAT/FILTER
VcfIndexTabixsort, Compress (bgz) a VCF and create tabix index on the fly.
VcfPeekVcfPeek INFO Tag and ID from another VCF
VcfGetVariantByIndexAccess a (plain or tabix-indexed) VCF file by the i-th index.
VcfMultiToOneAlleleVCF: "one variant with N ALT alleles" to "N variants with one ALT"
BedIndexTabixIndex and sort a BED on the fly with Tabix
VcfToHilbertPlot a Hilbert Curve from a VCF file.
Biostar145820Shuffl Bam/Subsample BAM to fixed number of alignments
PcrClipReadsSoft clip BAM files based on PCR target regions https://www.biostars.org/p/147136/
ExtendReferenceWithReadsExtending ends of REF sequence with the help of reads in BAM https://www.biostars.org/p/148089/
PcrSliceReadsMark PCR reads to their PCR amplicon https://www.biostars.org/p/149687/"
SamJmxMonitor/interrupt/break a BAM/SAM stream with java JMX
VcfJmxMonitor/interrupt/break a VcfJmx stream with java JMX
Gtf2Xmlconvert gff to XML in order to be processed with XSLT
SortSamRefNameSort a SAM/BAM on REF/contig and then on read/query name
Biostar154220Cap BAM to a given coverage. see https://www.biostars.org/p/154220
VcfToBamcreate a BAM from a VCF.
Biostar165777Split a XML file (e.g: blast)
BlastFilterJSFilters a XML Blast Output with a javascript expression

About

Java utilities for Bioinformatics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 94.8%
  • XSLT 3.9%
  • Makefile 1.3%