Variant quality filtering is operated in Variant Manager.
Use SNVs tab of the Variant Manager to apply basic filtering for SNVs and indels. These values are read from the opened VCF files and can be used interactively with the variant visualization.
The basic filters include:
To use separate quality filters for indels, go to Indels tab of Variant Manager and check "Use indel specific filters".
In somatic studies you can use the filters, tracks and comparison panel as in familial cases.
See example video of a whole-genome somatic variant analysis at our Youtube channel: "Somatic cluster analysis demonstration with BasePlayer"
Read analysis requires aligned BAM or CRAM files with index.
Variant quality metrics in VCF files are assessed by the used variant caller. Although the false positive rates of variant calls have decreased as methods have improved, read level inspection of variant calls and flanking regions are required in most cases. The integrated variant analysis and visualization in BasePlayer enables rapid workflow for variant validation - double-clicking the specific sample in the result table zooms in to the variant locus and expands the sample track. Reads appear on the sample track. Read level zoom enables user to inspect qualities, orientation, and mismatch rates of the overlapping and surrounding reads. In addition, the reference sequence and annotation tracks, such as repeat masker and mappability track, provide additional information about the variant calls.
Structural variants break points can be analyzed by inspecting discordant and splitted reads at the break point locus. BasePlayer groups discordant reads at the bottom to facilitate the analysis. The colors of the reads corresponds to either different kind of structural variant or chromosome color of the discordant read pair (in case read pairs have been mapped in different chromosomes). Concordantly mapped read pairs are colored as light green and red (forward and reverse strand).
On the above image, there is a LINE1 element at the left side of the left panel. Reads appear as dark, as the mapping quality is lower than set in the settings.
Colorful reads at adjacent to dark reads represents discordant reads, whose pair is aligned to different chromosome (color is determined by the pair's chromosome).
Double click discordant read to split the screen between the different chromosomes. You can split the screen as many times as
the monitor width allows.
At the right panel, there is a visible break point (mismatches in reads and in the coverage track).
Click read to view more information about the read.
Thousands of base pairs long reads are non-trivial to visualize, as a single read can span multiple genomic breakpoints, or multiple exons in the case of RNA data. BasePlayer recognizes long-read data, and makes a reasonable effort to present it properly. For instance, the researcher can divide the main screen between distinct genomic loci by double clicking a read that has been mapped to multiple locations (figure below). A demonstration video on how to visualize long-read data in BasePlayer can be viewed at the BasePlayer Youtube channel: "Third-generation sequencing data visualization with BasePlayer"
Three split views of the long read are shown, tracking the split mappings for a single long read. An inset info panel shows information on the selected read and a schematic illustration of split read orientations.
BasePlayer has a built in set of transcription factor position specific scoring matrices (PSSMs), obtained from Jaspar database and
https://www.nature.com/articles/nature15518.
If you are analysin the data with GRCh37, you can download aligned motifs from https://baseplayer.fi/tracks/ (TFbinding_sites_SELEX_GRCh37.bed.gz).
The BED file has the TF name in the "Name" field, which allows BasePlayer to recognize the correct PSSM for correct alignment and visualize it in place at the region track.
Option 1: download aligned motifs from Ensembl BioMart
Go to https://www.ensembl.org/ (make sure you go to correct version of the reference genome) and click BioMart from the header of the page.
Choose database -> Ensembl Regulation -> Choose dataset -> Human binding motifs.
Click "Attributes" and check the "Strand" checkbox.
Export the file as compresssed file and leave the TSV selected. Press "GO".
Open biomart file in BED converter and select "Name" for the Binding matrix column, and
chromosome, start,end and strand as in the original file header. Press convert.
Uncheck "zero based" from the track settings.
Option 2 (requires linux environment, python, bgzip and tabix): align and add motifs manually
You can add your own motifs to BasePlayer/additions/motifs folder. Create a .txt file and add the desired PSSMs in the file. Use the jaspar_pfm_add.txt as a template.
Motifs will be loaded to memory as the BasePlayer starts.
Use for example MOODS algorithm to align your motifs to the reference genome.
Conver MOODS output to BED file with this python script
Use following bash command:
cat moods_output.txt | python moods2bed.py | bgzip -c > output.bed.gz
After conversion create the tabix index for the file:
tabix -p bed output_bed.gz
The file can now be opened as a BasePlayer track and the motifs are visualized as a sequence logos.