from QC to gene prediction and phylogenomics
We are pleased to announce the release of new BUSCO datasets! Based on OrthoDBv12 (https://orthodb.org), the new datasets represent a significant increase in coverage over all domains. The new odb12 dataset release contains 36 datasets for archaea, up from 16, and 334 datasets for bacteria, up from 83. The eukaryota dataset release is being finalised and will be released in the coming weeks.
BUSCO v5.8.2 is the current stable version!
Gitlab, a Conda package and Docker container are also available.
Based on evolutionarily-informed expectations of gene content of near-universal single-copy orthologs, the BUSCO metric is complementary to technical metrics like N50.
BUSCO was selected as one of the SIB Remarkable Outputs of 2021!
Cite us
The latest BUSCO paper, describing the novelties introduced in BUSCO v4 and v5 and the new BUSCO datasets (*_odb10) are described here. If you've used these versions or datasets the correct citation would be:
Mosè Manni, Matthew R Berkeley, Mathieu Seppey, Felipe A Simão, Evgeny M Zdobnov, BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution, Volume 38, Issue 10, October 2021, Pages 4647–4654
The following protocol covers the various BUSCO running modes and workflows, BUSCO setup, guidelines to interpret the results, and additional analyses, e.g., for building phylogenomic trees and visualizing syntenies using BUSCO results:
Manni, M., Berkeley, M. R., Seppey, M., & Zdobnov, E. M. (2021). BUSCO: Assessing genomic data quality and beyond. Current Protocols, 1, e323. doi: 10.1002/cpz1.323
License
The BUSCO software is licensed under the MIT License.
The BUSCO datasets are licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
Any use of these datasets for analyses in a publication or product must include the citation of the corresponding paper: https://doi.org/10.1093/molbev/msab199.
Obtain BUSCO
Full installation instructions are provided in the user guide and protocols.
BUSCO is available as a conda package and as a Docker image. Both of these versions are ready to run out of the box. Alternatively, it is also possible to manually install BUSCO.
The BUSCO software directly downloads the necessary datasets, whether they are specified by the user or automatically selected by BUSCO.
To display all available datasets
busco --list-datasets
You can also download them manually.
Earlier versions: v4, v3, v2, v1
Documentation and support User guide Issues board
"Core" genes
Mandatory Options
-i # Input sequence file or folder
-m # BUSCO analysis mode to run. Can be 'genome', 'protein' or 'transcriptome'.
Recommended Options
-c # Number of threads/cores to use
-l # Specify the BUSCO lineage dataset to be used for scoring
-o # Specify the name of the output folder
Pipeline-specific options
Augustus pipeline
--augustus # Invoke the BLAST/Augustus pipeline
-e # E-value cutoff for BLAST searches.
--limit # How many BLAST candidate regions to consider per BUSCO (default: 3)
--long # Optimization Augustus self-training mode (Default: Off)
--augustus_parameters # "--PARAM1=VALUE1,--PARAM2=VALUE2"
--augustus_species AUGUSTUS_SPECIES # Specify a species for Augustus training
Metaeuk pipeline
--metaeuk # Invoke the Metaeuk pipeline
--metaeuk_parameters # "--PARAM1=VALUE1,--PARAM2=VALUE2"
--metaeuk_rerun_parameters # "--PARAM1=VALUE1,--PARAM2=VALUE2"
Miniprot pipeline
--miniprot # Invoke the miniprot pipeline (default for eukaryota, option for prokaryota)
Genome Mode (all)
--skip_bbtools # Skip BBTools for assembly statistics
--scaffold_composition # Writes ACGTN content per scaffold to a file scaffold_composition.txt
--contig_break n #Number of contiguous Ns to signify a break between contigs. Default is n=10.
Other options
--config CONFIG_FILE # Provide a config file
--download [dataset ...] # Can be a dataset name, "all", "prokaryota", "eukaryota" or "virus"
--download_path DOWNLOAD_PATH # Set a custom local location for downloaded files
-f # Force overwrite an output directory
-r # Restart a previous incomplete run
--list-datasets # List all available BUSCO datasets
--offline # Indicate BUSCO should not attempt to download files
-q # Quiet mode
--tar # Compress some output subdirectories