Download testing datasets

The assembled prokaryotic genome (’PROK1’) can be downloaded from https://zenodo.org/uploads/12699159. The unassembled prokaryotic genome (’PROK1’) can be downloaded from https://zenodo.org/records/12730077. Or use your own data, here we suppose your genome path is ‘PROK1’. Or directly take a genome file ‘PROK1/GCF_001228905.1_7038_3_46_genomic.fna.gz’ as input. Note: ‘PROK1’ should be a folder, and the folder should contain only genome files (*.fasta or *.fastq).

Multi-step implementation

import kssdtree

# Step1. Sketch the query genomes 
kssdtree.sketch(shuf_file='L3K9.shuf', genome_files='PROK1', output='PROK1_sketch', set_opt=True)

# Step2. Retrieve N closest genomes from the remote gtdbr214 database for each of the query genomes and compute the distance matrix and construct the phylogenetic tree. It will create 'output.newick' and 'output_accession_taxonomy.txt' in the output folder (e.g., ‘PROK31’ here).
kssdtree.retrieve(database='gtdbr214', genome_sketch='PROK1_sketch', output='PROK31', N=30, method='nj')

# Step3. Visualize tree 
kssdtree.visualize(newick='./PROK31/output.newick', taxonomy='./PROK31/output_accession_taxonomy.txt', mode='r')