Aggregated News

a sequence of DNA base pair letters

When the sequencing of the human genome was announced two decades ago by the Human Genome Project and biotech firm Celera Genomics, the sequence was not truly complete. About 15% was missing: technological limitations left researchers unable to work out how certain stretches of DNA fitted together, especially those where there were many repeating letters (or base pairs). Scientists solved some of the puzzle over time, but the most recent human genome, which geneticists have used as a reference since 2013, still lacks 8% of the full sequence.

Now, researchers in the Telomere-to-Telomere (T2T) Consortium, an international collaboration that comprises around 30 institutions, have filled in those gaps. In a 27 May preprint1 entitled ‘The complete sequence of a human genome’, genomics researcher Karen Miga at the University of California, Santa Cruz, and her colleagues report that they’ve sequenced the remainder, in the process discovering about 115 new genes that code for proteins, for a total of 19,969.

“It’s exciting to have some resolution to the problem areas,” says Kim Pruitt, a bioinformatician at the US National Center...