DNA

Molecular methods

Please contact the authors to request pre-publication genetic data.

DNA sequences for this project were produced through Sanger sequencing. The Qiagen DNeasy blood and tissue kit was used for extraction of genomic DNA. Standard protocol was a leg from adults and ventral abdominal segments of larvae. With small species a full body extraction was often done. The specimens were identified or confirmed by the lead author of this tool, or in a few cases by cerambycid taxonomists that donated tissue: Petr Svacha (Czech Academy of Sciences) and Masaru Nonaka (Tokyo Metropolitan University).

A PCR step was performed prior to sequencing to amplify the regions of interest (COI barcode and 3’ end, ArgK, CAD). For the two nuclear genes a hemi-nested PCR was done to better isolate the desired region. Primers are outlined in table 1. For ArgK the first primer pair was ForB2/RevB1, then ForB4/RevB1. For CAD: CD338/CD688R then CD338/CD668R. The PCR protocol followed McKenna et al. 2015McKenna et al. 2015:
McKenna DD, Wild AL, Kanda K, Bellamy CL, Beutel RG, Caterino MS, Farnum CW, Hawks DC, Ivie MA, Jameson ML, Leschen RAB, Marvaldi AE, Mchugh JV, Newton AF, Robertson JA, Thayer MK, Whiting MF, Lawrence JF, Slipinski A, Maddison DR, and Farrell BD. 2015. The beetle tree of life reveals that Coleoptera survived end-Permian mass extinction to diversify during the Cretaceous terrestrial revolution. Systematic Entomology 40: 835–880.
.

Gene

Primer

Direction

Sequence

Reference

COI-barcode

bycF

F

TTTCAACWAACCAYAAAGATATTGG

Karpinski, Gorring, Cognato. 2023

 

bycR

R

TAAACTTCWGGATGWCCAAAAAATC

Karpinski, Gorring, Cognato. 2023

COI-3' end

C1-J-1718

F

GGAGGATTTGGAAATTGATTAGTTCC

Simon et al. 1994

 

TL2-N-3014

R

TCCAATGCACTAATCTGCCATATTA

Simon et al. 1994

CAD

CD338F

F

ATGAARTAYGGYAATCGTGGHCAYAA

Moulton & Wiegmann 2004

 

CD668R

R

ACGACTTCATAYTCNACYTCYTTCCA

Wild & Maddison 2008

 

CD688R

R

TGTATACCTAGAGGATCDACRTTYTCCATRTTRCA

Wild & Maddison 2008

AK

ForB2

F

GAYTCCGGWATYGGWATCTAYGCTCC

Danforth, Lin, Fang 2005

 

RevB1

R

TCNGTRAGRCCCATWCGTCTC

Danforth, Lin, Fang 2005

 

ForB4

F

GAYCCCATCATCGARGACTACC

Jordal 2007

The genes represented in the conifer-feeding longhorn dataset are ~783bp of nuclear arginine kinase (ArgK), ~940bp nuclear carbamoyl-phosphate synthetase domain of rudimentary (CAD), and two regions of mitochondrial cytochrome c oxidase subunit I (COI)-the universal barcode and a ~1200bp partially overlapping 3’ segment which has also shown diagnostic utility in cerambycids.

How to use this genetic data

The data is provided here in an aligned nexus file and an unaligned fasta text file. These sequences have also been submitted to GenBank, see fact sheets for direct links to sequences. If you are dealing with many specimens and want to easily query against these data for a best match it would help to make a local Blast database that you can use offline. Find more information here: https://www.ncbi.nlm.nih.gov/books/NBK569841/. In cerambycids, if you have a >97% match in COI it is a likely species match, but there are exceptions. Species of the same genus are often within a 10% COI difference, 7% CAD, and 5% ArgK. The genus-level distance is currently being empirically evaluated.

Beyond percent similarity (blasting), it can be helpful to place the unknown specimen in a tree to sort synapomorphic vs. autapomorphic (unique to a taxon) characters. To do this:

  1. Add the unknown sequence to the fasta file of the proper gene by opening the file in a text editor.
  2. Align this file, this can be done locally or using an online server like MAFFT. If desired, assign a substitution model at this point with jmodeltest (Posada 2008Posada 2008:
    Posada D. 2008. jModelTest: Phylogenetic model averaging. Molecular Biology and Evolution 25: 1253–1256. https://doi.org/10.1093/molbev/msn083
    ) or PartitionFinder 2 (Lanfear et al. 2016Lanfear et al. 2016:
    Lanfear R, Frandsen PB, Wright AM, Senfeld T, and Calcott B. 2016. Partitionfinder 2: New methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Molecular Biology and Evolution 34: 772–773. https://doi.org/10.1093/molbev/msw260
    ). For barcode data, a GTR model is usually a good default.
  3. Follow the instructions to adapt your file format for your phylogeny tool of choice, RAxML and MrBayes run quickly for small tasks. Your specimen should group with other species of its genus unless a taxonomic issue exists.

For a parsimony analysis using PAUP software, add the unknown sequence to the nexus file of the specific gene set. For ArgK, which often has an intron, minor adjustment of the alignment or a global realignment may be necessary if this gene is used. Under analysis, chose a heuristic search with 100 random stepwise additions. This will allow for a sufficient search for the most parsimonious tree. Likely the search will result in multiple most parsimonious trees. Create a strict consensus of the trees under “trees -> compute consensus”. The grouping of the unknown sequence with known species provides the generic identification.

Vouchers

The voucher specimens for the genetic data are stored in Patrick Gorring’s research collection, at the A.J. Cook Arthropod research collection at Michigan State University, or with collaborators around the world who donated tissue or DNA extract. Photos of many vouchers are available by searching for voucher in the image gallery.