Skip to content

doricke/BioTools

Repository files navigation

BioTools

The BioTools are complete Bioinformatics programs.

Database Searching and Alignment Tools:

ProteinSearch - searches a list of protein FASTA databases with for protein sequences. - this program can identify evolutionarily related sequences deep into the twilight zone (as low as 5% sequence identity).

AlignHits - generates protein multiple sequence alignments from ProteinSearch results - this program can align more than 20,000 protein sequences!

Parser Tools:

GenBankParser - GenBank source file parser; generates FASTA files

SwissProtParser - SwissProt source file parser; generates FASTA file

java -jar SwissProtParser.jar uniprot_sprot.dat

Example output

>A1BG_HUMAN /organism="Homo sapiens (Human)" /accession="P04217" /gene="A1BG" /taxon_id="9606" /refseq_id="NP_570602" /description="RecName: Full=Alpha-1B-glycoprotein; AltName: Full=Alpha-1-B glycoprotein; Flags: Precursor;" /id="A1BG_HUMAN" /taxonomy="Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo"
MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQ
AHLETPDFQLFKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTG
WTQLSKLLELTGPKSLPAPWLSMAPVSWITPGLKTTAVCRGVLRGVTFLL
RREGDHEFLEVPEAQEDVEATFPVHQPGNYSCSYRTDGEGALSEPSATVT
IEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVDFQLRRGEKEL
LVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELI
LSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQ
SPAGTEALFELHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPP
PRPQLRATWSGAVLAGRDAVLRCEGPIPDVTFELLREGETKAVKTVRTPG
AAANLELIFVGPQHAGNYRCRYRSWVPHTFESELSDPVELLVAES

Utilities:

lengths.pl - summarizes the lengths of sequences in a FASTA sequences file.

Motifs.pl - identifies Prosite motifs in a protein FASTA sequences file.