BasePlayer manual

Main window

The main window of BasePlayer displaying three samples, a genomic region track and a population control data track. Variant view of chromosome 10. (a) Toolbar contains tools for managing the data and navigating the genome. (b) Genome bar visualizes chromosome bands and genes. Reference sequence and gene annotation can be changed from dropdown menus at left. Memory usage shows used and allocated memory for BasePlayer. (c) Region tracks are used to exclude and annotate variants. In addition, various types of regional or base specific scores can be visualized as histograms and TF binding motifs as sequence logos. Pressing the red play button will apply the track - this will exclude any variants outside the region. (d) Control tracks can be used to exclude common polymorphisms by setting the allele frequency threshold. (e) Sample tracks visualize VCF and BAM files of the sample overlayed. Sample name and statistics for currently visible variants shown on the left. Vertical lines represent variant calls in a sample, colored as red, green (SNV or indel in coding region, respectively) and grey (variant in non-coding region). Height is relative to the sequencing coverage at the variant locus.

Toolbar

  1. File open, additional tool, variant manager and help menus
  2. Change chromosome
  3. Go to previous location (back/forward buttons)
  4. Search bar. Search by position (e.g. 4:5306600) or search by gene name or ENSG code
  5. Zoom back to chromosomal level
  6. Current view region (start and end position of the screen)
  7. Current view length

File menu

  • Add VCFs

    Open VCF files (vcf.gz preferrably with tabix index file ".vcf.gz.tbi"). Variants will open to sample tracks. If BAM file is located at the same directory and has the same name, it will be opened at the same track. Reads will appear when zoomed in (view length less than 60kbp)

  • Add BAMs

    Open BAM and CRAM files (index files required (.bam.bai or .cram.crai)). Reads will open to sample tracks. Reads will appear when zoomed in (view length less than 60kbp). Coverage will be drawn at the zoom distance set in the settings (default 1Mbp).

  • Add tracks

    Open track files (BED, BigWig, BigBed, GFF, TSV, txt). Read more about track files from the Region/annotation tracks section

  • Add controls

    Open control files (single or multi VCF). Control file will open to Control track.

  • Add track from URL

    Open track files from external server. Paste ftp or http address to the URL field (see image below) and press "Fetch".


  • Open/save project

    Open and save BasePlayer projects (.ses).

  • Genomes

    Change and add new genomes (opens the Genome selector)

  • Update

    Update BasePlayer. The button is visible only when updates are available. It is also possible to update BasePlayer manually by downloading BasePlayer.jar from https://baseplayer.fi/update.

  • Clear data

    Clears all opened samples, tracks and controls.

  • Exit

    Closes BasePlayer.


Tools

Table browser

Table Browser is a tool for fast browsing through the genome. It is particularly useful when validating break points of the structural variants or the quality of variant calls. You can open any tab separated file (gzipped or raw text file) with the chromosomal position information. Double clicking the position will take user to the selected locus and visualize the data.

On above example, BasePlayer output is opened for visual validation of candidate variants. User can select from the user interface (dropdown menus at the table header), which column corresponds to which information Options for header values are:

  1. Position
  2. Use this selector, when there is chromosome and position on the same column (e.g. BasePlayer output above; 3:24,446,246)
  3. Chromosome
  4. When chromosome is on its own column, use this selector.
  5. Start
  6. If the file has a separate column for chromosome and position(s) use this selector for start position (Figure below).
  7. End
  8. If the file has a separate column for chromosome and regional (start/end) positions use this selector for end position (Figure below).
  9. Sample
  10. If the file contains positions for multiple samples select this header value for the column containing sample names. BasePlayer will expand the selected sample when the position is double clicked (Figure above).
  11. Gene
  12. If the file contains gene names, double clicking the "Gene" column will zoom in to the selected gene (Figure above).
  13. Editable
  14. This column value marks the column as "Editable", which allows user to edit the table values. The last column is automatically set as Editable, where user can write the validation status of the variant, for example.
In case you have edited the table, use "Write" button to save the changes.
See demonstration video of the usage of "Table Browser" at:

Coverage calculator

Use this tool to get coverage information from the opened BAM files. First, open BAM files as sample tracks "File -> Add BAMs". Open "Coverage Calculator" and open BED file containing e.g. targeted regions which will be used in coverage calculation. If you have only a few regions in the BED file, check the "Small region" checkbox for faster processing. Press "Execute" to start the calculation.

Column explanations:
  1. Sample
  2. Sample name
  3. Average coverage
  4. Average coverage of the BAM file in opened BED file regions
  5. Average mapping quality
  6. Average mapping quality of the reads in the BAM file
  7. Soft clip rate
  8. Proportion of the reads having soft clips (https://www.biostars.org/p/119537/)
  9. Zero quality rate
  10. Proportion of reads having a mapping quality of zero
  11. Covered (%) (Coverage : Percentage)
  12. Indicates the proportion of covered regions out of all regions at different coverage levels. I.e. "5 : 63" means that if the opened region BED file have total length of 100,000bp, 63% of the total region (63,000bp) have been covered at least with 5 reads. The last value e.g. "10+ : 48%" means that 48% of the regions have been covered with at least ten overlapping reads.
  13. Status
  14. Shows the progress of the coverage calculation

Variant caller

This is the test version of a simple variant caller, which you can use for BAM files in case you do not have VCF files available.

  1. Min. alt-read count
  2. Minimum read count for the alternative base calling reads / how many reads are required for the variant call
  3. Min. coverage
  4. Minimum total coverage of the locus to be used in variant calling
  5. Min. mapping quality
  6. Minimum mapping quality of the reads, which are used in variant calling
  7. Min. base quality
  8. Minimum base quality of the mismatched base in the read to be used in the calling
  9. Require both strands
  10. Base is called only if both forward and reverse strand reads call the same variant
  11. Require multiple runs
  12. In case the used BAM file contains reads from different runs, require that the variant is present in all runs (possibly eliminates run specific errors)
  13. Calc only for selected
  14. Calculate variants only for the selected sample (click sidebar of the sample track to select a sample)
  15. Before annotation
  16. Start the variant calling after the "Annotation" is pressed in the "Variant Manager"

BED converter

BED converter is a tool, which allows user to import external annotation data, which is not in standard format (BED, BigWig, GFF etc.), to BasePlayer. It converts any tab separated file (containing at least chromosome, start and end position fields) to a sorted, compressed and indexed BED file. The file is then usable as an annotation, filtering and visualization track. See demonstration video of the BED converter usage with UCSC data in link below.

Open tab separated text file (gzipped or plain text) in BED converter. Several rows of the file appears on the table. Select correct header description for the columns using the dropdown menus on the header (red circle). Some columns may have been selected automatically. You must select at least "Chromosome", "Start" and "End" columns for BED conversion. Optional fields are "Name", "Score" and "Strand". Only selected files are added to processed BED file.


Genome bar

The genome bar visualizes the opened reference sequence, cytobands (at least for humans and mice) and genes. At the whole chromosome level zoom, genes which have a screen length more than 1 pixel are named and aligned on different (y-coordinate) levels. I.e. the more you zoom, the more genes are visible.

Left Sidebar contains label for the selected chromosome, dropdown menus for selecting the reference genome / gene annotation, and current memory usage out of the total memory allocated for BasePlayer. You can use the dropdown menus to add new annotations and reference sequences as well.

The current location of the screen is visualized on top of the cytoband with the red border. You can jump easily to the desired region by dragging mouse on top of the cytoband. Centromere is indicated as red triangles.

On the figure above, the last exon of the gene APC is shown. Reference sequence and aminoacids are visible at this zoom level. In addition, variants of the all opened samples are collapsed into this view (red and green bars) showing also the amino acid change or the indel effects in place.

Tip: drag mouse on top of the reference genome to copy the sequence to clipboard. You can then paste (ctrl-v) the sequence to external software.
Tip2: right click the genome bar screen to copy the position to clipboard.

On the figure above, the DCP2 gene and its isoforms are shown. Right click any exon of the gene to expand the isoforms (if available). By default, longest transcript is shown. Click the exon for more information of the gene and exon. click ENSG or ENST code to open Ensembl page for the gene or specific transcript. Click "View in GeneCards" to see the webpage for selected gene.

Region/annotation tracks

The region track visualizes opened region files (BED, BigWig etc.). The track is also used to annotate or filter opened variants. Open region files in "File -> Add tracks". The "Play" button appears to the left sidebar when there are opened VCFs on the sample tracks. If region file containts values, the "Histogram" button appears on the sidebar. The region height is determined by its value, when the Histogram view is selected. In case of negative and positive values, region track is splitted vertically, zero value in the middle, negative values as red. The scale of the view is shown at the bottom left corner of the track.

If the region track file is larger than set in the Settings" -> "General" -> "Big file size for tracks (MB)", the track is visualized, when zoomed in closer than set in "Settings" -> "General" -> "Processing window size (bp)".

Right click the track or left click the cogwheel symbol to open the track settings:

  1. Intersect
  2. If "Play button" is pressed i.e. track is applied and intersect is selected, variants outside the regions are excluded / hidden. If not checked, no variants are excluded, but are still annotated for the overlapping occurrences.
  3. Subtract
  4. If track is applied and subtract is selected, variants inside the regions are excluded / hidden.
  5. Zero based
  6. If start and end positions in the file are not reported starting from zero, uncheck this checkbox and the positions will be shifted accordingly.
  7. Log scale
  8. Change the histrogram view to log scale
  9. Auto scale (in histogram view)
  10. Scale the peak heights by the highest/lowest value on the screen
  11. Auto collapse (in region view)
  12. Collapse and expand track regions as zoomed in/out
  13. Report affinity changes
  14. If the track file has a name-field recognized as a transcription factor, affinity changes will be reported on the results if this checkbox is selected
  15. Apply in annotation
  16. In case the track file is big ("Settings -> General"), it is annotated using the window size set in the settings and may take a while to complete. Use this checkbox to apply the track automatically after the "Variant Manager" -> "Annotation" is pressed.
  17. Value limit
  18. Use only regions of the track file, which have higher than the set value
  19. Left vertical slider
  20. When not in histogram view, use the slider to set the region bar height

If the track file is recognized to include transcription factor binding sites by the "Name" column, binding motifs are visualized at the sequence level zoom.

Control tracks

The control track allows user to apply and set options for control files. Open VCFs for this track in "File -> Add controls". The track will appear above the sample tracks. Set the allele frequency limit for the control file. When track is applied, variants having allele frequency more than the limit will be excluded / hidden. Set the limit value to "1" for only annotation of variants. Right click the control track to select specialized controlling for indels "Overlap indels". Indels are excluded if the control file have overlapping indel at that locus. Otherwise exact match is required for the indel to be filtered out.

Open control file as a sample for the visualization of the file. In upcoming versions, variants will be visualized in the control track when zoomed in.

Sample tracks

Sample tracks visualize VCF and BAM files. If you have BAM file named after the VCF file and those are placed in the same folder, both files will be visualized on the same track.

The left sidebar shows information about the visible variants on the screen:

  1. Sample name
  2. Sample name is the name of the VCF or BAM file by default. You can change it in the sample data dialog (see below)
  3. Variants
  4. The number of visible variants (SNVs + indels) on the screen and indel count in parentheses
  5. Number of heterozygotes
  6. Number of homozygotes
  7. Transition / Transversion rate
  8. Filetypes on the track
  9. If both VCF and BAM/CRAM is colored green, both files are visualized on the track
  10. Var height by...
  11. Variant height on the screen indicates the coverage (by default) at the variant locus. You can change the height determinant from the "Settings -> Variants" The maximum height value of visible variants is shown in parentheses.

Every vertical line on the sample track represents a line of the VCF file (variant call). Red and green lines indicate coding SNVs and indels, respectively. Intronic variants are light gray and intergenic dark gray. Variants are visible, if they pass all the filters set in the "Variant Manager" and have not been filtered out by applied control or region tracks.

On the above screenshot, the bottom track visualizes ClinVar data and is used as an annotation (yellow indicator on the sidebar).

If you zoom in to the sequence level (image above), the reads are visualized from the corresponding BAM file. At the top of each track is the coverage track, which shows position specific coverage and possible mismathes read from the read data. The variant call from VCF is shown behind the reads (red line in the image).

To get more information about the individual reads or variants, you can click them at the sequence level zoom (above image). Click the variant to view the VCF information popup, which shows the VCF line information as is.
Click the read one time to view the BAM information popup:

  1. Read name
  2. Chromosomal position and strand (in parentheses)
  3. Mapping quality (phred scale; 60 is typically maximum value)
  4. Insert size
  5. Insert (or fragment) size is inferred from the paired end reads. It is the distance between the start position of the first read and the end position of the second read. The number is negative for the second read pair.
  6. Cigar string contains information about the indels and clipped parts of the read
  7. Read length
  8. Chromosomal position of the mate (or pair) and its orientation
  9. Indicator, whether the read is primary alignment.
See more read visualization features at 3rd generation sequencing data analysis

Right click the sidebar, or click the sample name on the sidebar to open "Sample data" dialog (image below):

  • Sample name

    Change the track/sample name for the project

  • Annotation

    Set the variant track as an annotation track (e.g. ClinVar data VCF). Filters will not apply to annotation track

  • Gender

    Select gender for the sample. If sample is set to another sample's father or mother - gender is selected automatically

  • Affected

    Set sample as an affected. This information is used when using inheritance patterns in variant annotation

  • Mother / Father dropdowns

    Select mother and/or father for the sample from the other opened samples.

  • Color dropdown

    Select a distinct color for the family members (sidebar color will change)

  • VCF and BAM path

    Full path to the directory of the opened VCF and BAM files

Genome selector

Genome selector appears on the first start of the BasePlayer. This tool allows you to add new reference genomes and gene annotations. Revisit this window by clicking File -> Genomes -> Add new genome.... or by selecting "Add new reference..." from the reference dropdown menu on the left sidebar of the Genome bar.


1. Fetch new reference genomes and gene annotations from Ensembl if available.
2. Select Ensembl genome.
3. Download selected Ensembl reference genome and gene annotation. Files will be automatically installed to your genome directory (9.).
4. Get download links for the reference genome and gene annotation files. If you can not connect to the Ensembl via BasePlayer, download the files using web browser and add the genome manually using the "Add new reference..." in this window (6.) and "Add new annotation..." (5.)
5. Add new annotation file (GFF3 or GTF) to the reference genome.
6. Add new reference genome (FASTA).
7. Remove selected genome or annotation.
8. Check, whether selected Ensembl genome has newer gene annotation version available.
9. Your genome directory, where genomes will be installed.

See instruction video, how to add genomes to BasePlayer:

Variant Manager

Variant Manager user interface and functions.

  1. Panels for variant quality filtering, variant hiding and sample-wise comparison
  2. SNV and Indel quality filters. Separate filters for indels can be activated in the "Indels" tab - same filtering is used by default for SNVs and indels
    1. Min. quality score: filter variants by the "QUAL" field of the VCF file
    2. Min. genotype quality score: filter variants by the GQ field of the VCF file
    3. Min. coverage: filter variants by the total coverage at the variant position
    4. Max. coverage: filter variants if coverage exceeds this value
    5. Min/max allelic fraction: filter variants by the allelic fraction. Green handle for the minimum and red handle for the maximum value. e.g. remove homozygous variants by setting the max value to less than 100%.

  3. Variant annotation panel including gene effect and output writing options
  4. Panels for annotation results for genes, clusters, statistics and annotation tracks
  5. Result table shows a list of genes and variants which pass the filters and options set in the Variant Manager as well as any activated track. Color red, yellow, green and gray indicates nonsense, missense, synonymous and non-coding variants, respectively.

    Click the gene row to expand the variant view. In case multiple samples share the variant, "Multiple" is shown in the Sample column. Click the multiple row to expand the samples.

    If controls are applied, there is column for the allele frequency in the control data, including odds ratio with the p-value (fisher exact, right tailed). Also applied annotation tracks are shown in separate columns.

Indel tab is identical to the SNV filtering panel with the exception of the "Use indel specific filters" checkbox. Select the checkbox to activate the indel specific filters.

Hide panel contains checkboxes to hide / filter out variants with the provided characteristics. If you have set the filters for your project and do not need to adjust them, you can check the "Freeze filters" checkbox for faster loading times and lesser memory consumption. Variants, which do not pass the variant filters, will not be read from the VCF file and not loaded to memory.

Use this panel to compare variants or mutated genes in multiple samples.
The upper slider sets the minimum number of cases which must harbor a variant in the same gene for these variants to be shown. It can thus be used if the sample set consists of cases with a common disease, which might not share the same specific variant even though the same gene is mutated (e.g., sporadic cases or somatic mutations in cancer). Lower slider compares individual variants. For instance, the value 2/3 means that only variants shared by two or three (all) samples are included and visible on the screen. "Window size for variant cluster" sets variant clustering behavior.

Inheritance pattern selectors.
If you have set affected samples from the "Sample data" dialog, you can use inheritance patters when annotating variants.

Gene annotation options. These settings affect to the variant annotation with genes.
Only for non-synonymous: do not output synonymous variants to the results
Only truncs:output only nonsense mutations
Show intronic: output intronic variants
Show intergenic: output intergenic variants
Show UTR: ouput UTR variants (including non-coding genes and RNAs)
All chromosomes: annotate automatically all chromosomes. Otherwise only visible variants on the screen are annotated
From this chr?: continue annotation from current chromosome till the end
Only autosomes?: annotate variants only for chromosomes 1-22 (in humans)
Only selected sample?: output only variants of the selected sample
Only stats?: do not output anything on the "Gene" result table. Calculate only statistics to the "Stats" table.
Window calculation: annotate variants in batches. (use settings to set the window size)
Write directly to a file: do not output anything on the result table. Pressing "Annotation" will open save file dialog for the output file.


If variants have been called e.g. with GATK HaplotypeCaller, click the "Min. quality score" text on the Variant Manager to see the "Hard filters" popup. The suggested parameters can be found at https://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set
Press "Apply" to apply the hard filters.

Note: FS parameter has "greater than" symbol set before the value.


Set output file format.
TSV: creates tab separated output file for Excel or other software. Every variant and sample are in a separate rows.
Compact TSV: same as TSV above, but samples have been collapsed on the same row.
VCF: creates VCF file from the outputted variants. Oncodrive: creates a tab separated file recognized by the Oncodrive software


Variant statistics table.
Includes statistics for all visible variants after annotation.

Settings

Open BasePlayer settings in "Tools -> Settings" or use the cogwheel symbol at the rightmost of the Toolbar

General

  1. Coverage draw distance (bp)
  2. Set the distance, how far the BAM file is visualized as a coverage histogram. Set it for example to 1 billion to view coverages at the chromosomal level (high-coverage whole-genome data will be slow with this setting)
  3. Processing window size (bp)
  4. When handling a massive dataset without enough available memory, you can calculate and annotate variants in smaller batches (see "Variant Manager"). This value determines the window size for the batch. Also, if track file bigger than set in below field is opened, it will be visible at this distance.
  5. Set the file size for big track files to prevent the visualization of the data at the chromosomal level. If handling e.g. 80GB file it is not feasible to open and visualize the track data as a whole

Variants

Select the characteristic of the variant, which will determine its height on the screen (default: coverage).
Coverage: total coverage in variant position
Allelic Fraction: a fraction of reads calling the alternative base out of total coverage (values between 0 and 1)
Quality: value based on the QUAL field of the VCF
GQ: Genotype quality: value based on the GQ value of VCF file (not necessarily set)
Calls: a nuber of reads calling the variant

Reads

Select options for read visualization.
Maximum insert size: the insert size greater than the set value is considered as a deletion and the read is colored green
Mapping quality: the read with mapping quality below the set value is colored black
Base quality: mismatched bases on the read, which have base quality below the set value have dimmer color
Read depth limit: reads are drawn only at the height of the set value
Show bases in sofclips: BAM file may not have mismatch information for the soft clipped parts of the read. Select this checkbox to show them anyway

Appearance

Change the appearance of BasePlayer.
Font size: set the preferred font size and weight used in the software
Background color: set the background lightness of the region, control and sample tracks
Sidebar color: set the sidebar colors using RGB values

Proxy

Set up your proxy settings. Check "Use proxy", select the protocol, host address and port. Finally press "Save".

See instruction video at