We have previously reported on a system for large-scale molecular virus screening of clinical samples. As part of an effort to systematically search for unrecognized human pathogens, the technology was applied for virus screening of human respiratory tract samples. This resulted in the identification of a previously unknown polyomavirus provisionally named KI polyomavirus. The virus is phylogenetically related to other primate polyomaviruses in the early region of the genome but has very little homology (<30% amino acid identity) to known polyomaviruses in the late region. The virus was found by PCR in 6 (1%) of 637 nasopharyngeal aspirates and in 1 (0.5%) of 192 fecal samples but was not detected in sets of urine and blood samples. Since polyomaviruses have oncogenic potential and may produce severe disease in immunosuppressed individuals, continued searching for the virus in different medical contexts is important. This finding further illustrates how the unbiased screening of respiratory tract samples can be used for the discovery of diverse virus types.
Persistent virus infections are an integrated part of human life. Most humans are persistently infected with one or more herpesviruses, papillomaviruses, polyomaviruses, and anelloviruses and remain healthy. Nevertheless, many of these viruses may occasionally produce severe disease. The identification of previously unrecognized viral species is technically difficult. Thus, many potentially medically important persisting human viruses most likely remain undetected.
Polyomaviruses are small DNA viruses capable of persistent infection and having oncogenic potential. They have been found in many mammals and birds worldwide. Two polyomaviruses are known to normally infect humans, JC virus (JCV) and BK virus (BKV), both discovered in 1971. They are genetically closely related to each other, and both viruses show 70 to 80% seroprevalence in adults. The routes of acquisition and sites of the primary infection are largely unknown, but both viruses can establish a latent infection in the kidneys and, in the case of JCV, also in the central nervous system . Persistent replication in the kidneys is evidenced by the fact that JCV, and occasionally also BKV, can be detected in the urine of healthy adults. BKV has also been detected in the feces of children. JCV and BKV are highly oncogenic in experimental animals, but a role in the development of human tumors has not been established. Disease caused by human polyomaviruses has been observed in immunosuppressed individuals. JCV is the causative agent of progressive multifocal leukoencephalopathy, a demyelinating disease of the brain and a feared complication of AIDS. This disorder has recently received renewed attention after the occurrence of fatal cases among patients treated with natalizumab for multiple sclerosis. BKV has been associated with posttransplantation nephropathy and hemorrhagic cystitis in hematopoietic stem cell transplant (HSCT) recipients. In addition to JCV and BKV, there are reports on the presence of the primate polyomavirus simian virus 40 (SV40) in humans, possibly introduced by contaminated poliovirus vaccine produced in monkey cells, although other ways of transmission have also been suggested. SV40 genomic sequences have been detected in human malignant mesothelioma tumors, but its role in human tumor development remains debated.
We have developed a system for large-scale molecular screening of human diagnostic samples for unknown viruses (2). With this technology, we have initiated a systematic search for previously unrecognized viruses infecting humans in order to identify agents that are potentially involved in human disease. We describe here the identification and molecular characterization of a hitherto unknown human polyomavirus, which is only distantly related to the other known primate polyomaviruses. In analogy with the nomenclature of the other human polyomaviruses, we propose the name KI polyomavirus, KIPyV, for the newly discovered virus
MATERIALS AND METHODS
Molecular virus screening.
As part of a systematic search for unknown viruses in clinical respiratory tract samples, a screening library was constructed from cell-free supernatants of 20 randomly selected nasopharyngeal aspirates made anonymous and submitted to the Karolinska University Laboratory, Stockholm, Sweden, for the diagnosis of respiratory tract infections. The samples were collected from March to June of 2004 and stored at −80°C until analyzed. This study was approved by the Karolinska Institutet local ethics committee. The procedure used for the identification of virus nucleic acid sequences, molecular virus screening, has been described previously. In brief, samples were pooled and the pool was divided into two aliquots, which were filtered through 0.22- and 0.45-μm-pore-size disc filters (Millex GV/HV; Millipore), respectively. Both aliquots were ultracentrifuged at 41,000 rpm in an SW41 rotor (Beckman) for 90 min. The resulting pellet was recovered, resuspended, and treated with DNase before DNA and RNA were extracted. Extracted DNA and RNA were amplified separately by “random PCR” . The amplification products were separated on an agarose gel, and fragments between approximately 600 and 1,500 bp in length were cloned. A total of four libraries were generated, derived from DNA or RNA, and filtered through a 0.22- or 0.45-μm filter. Ninety-six clones from each library were sequenced bidirectionally, i.e., a total of 384 clones. A set of specially designed C++ and Perl programs were used for automated quality trimming, clustering, BLAST searches, sorting, and formatting of the sequence reads. The output was a sorted list of the best database hits for nucleotide and translated sequences.
Genomic analysis of the KIPyV genome.
A 4,808-bp-long PCR product reaching around the circular DNA genome was generated by primers directed “outward” from the first cloned fragment (Pol-82R [TTGACTTCTTGGCCTTGTTAG] and Pol-315F [AGATGCTGACACAACTGTATG]) and by using a long-range enzyme mixture (Platinum Taq High Fidelity; Invitrogen). A second PCR product of 500 bp overlapping both ends of the long product and closing the circle was generated by primers PolconF (GGATTTTGTATGTGCTAGAAC) and PolconR (TTAACTAGAGGTACAACAAGC). Both PCR products were directly sequenced in order to obtain a consensus sequence for the complete genome. The same procedure was applied for determining the full-length sequences of three isolates. Putative open reading frames (ORFs) were identified, and sequences were aligned with Clone Manager Suite 6 (version 6.00) and Align Plus (version 4.10) (Scientific and Educational Software, Durham, NC). Prediction of putative binding sites for transcription factors was performed by comparison with consensus sequences and with the help of the Alibaba software.
All sequences were downloaded from GenBank, except those of murine pneumotropic virus, which were based on a corrected sequence (T. Ramqvist, unpublished data). Accession numbers are available upon request. The complete genomes and the amino acid sequences of the early and late proteins, respectively, were aligned and neighbor-joining trees generated with ClustalX version 1.83. The data were bootstrapped with 1,000 replicates, and trees were viewed with NJplot. For whole-genome analysis, the noncoding control regions were removed in accordance with established conventions and the first nucleotide in the T antigens was designated nucleotide
PCR for detection of KIPyV.
PCR experiments for detection of KIPyV were performed in a diagnostic laboratory setting, ensuring that the necessary precautions to avoid contamination were taken. Positive and negative controls were included in each experiment. DNA was extracted by commercially available kits as described under the respective sample type. Five microliters of extracted DNA was used as the template for the nested PCR. The 50-μl reaction mixtures used for the first and second PCRs consisted of 1× GeneAmp PCR buffer II (10 mM Tris-HCl [pH 8.3], 50 mM KCl; Applied Biosystems), 2.5 mM MgCl2, 0.2 mM each deoxynucleoside triphosphate, 2.5 U of AmpliTaq Gold DNA polymerase (Applied Biosystems), and 20 pmol of each of the primers. The first-PCR primers were POLVP1-39F (AAG GCC AAG AAG TCA AGT TC) and POLVP1-363R (ACA CTC ACT AAC TTG ATT TGG). The second-PCR primers were POLVP1-118F (GTA CCA CTG TCA GAA GAA AC) and POLVP1-324R (TTC TGC CAG GCT GTA ACA TAC). The cycling conditions for the first and second PCRs were 10 min at 94°C, followed by 35 cycles of amplification (94°C for 1 min, 54°C for 1 min, and 72°C for 2 min). Products were visualized on an agarose gel. The product size after the second PCR was 207 bp. All PCR products were sequenced in order to confirm that they were specific for KIPyV.
Prevalence study populations.
(i) Nasopharyngeal aspirates. Six hundred thirty-seven stored nasopharyngeal aspirates submitted to the Karolinska University Laboratory for diagnosis of respiratory virus infections from July 2004 to June 2005 were studied. Sampling month, patient’s age and sex, and routine diagnostic (immunofluorescence and virus culture) findings were recorded before samples were made anonymous. The median age of the sampled patients was 7 years (range, 0 months to 90 years). Two hundred seventy-one samples came from children <2 years old. Total nucleic acids were extracted from 200-μl samples by the MagAttract Virus Mini M48 kit (QIAGEN), and nucleic acids were eluted in 100 μl. Eluted nucleic acids were initially analyzed in pools of 10 samples, and 5 μl of the pool was used as the template for the PCR. Single samples from PCR-positive pools were analyzed.
(ii) Feces. One hundred ninety-two fecal samples submitted to the Karolinska University Laboratory for diagnosis of virus infections from 1 July 2005 to 30 November 2005 were studied. Samples were mainly submitted for diagnosis of gastroenteritis. Basic sampling data were recorded before samples were made anonymous. The median age of the sampled patients was 1 year (range, 0 months to 17 years). One hundred nineteen samples came from children <2 years old. Nucleic acids were extracted from 400 μl of a frozen 20% feces suspension by MagAttract Virus Mini M48 Kit and the Bio robot M48 instrument (QIAGEN) and eluted in 100 μl, and 5-μl samples were used for subsequent individual PCR assays.
(iii) The urine of HSCT recipients. One hundred fifty urine samples collected from HSCT recipients for the study of BKV and JCV were analyzed. Fifty of the samples were selected on the basis of previous analysis results; 20 were previously shown to be positive for BKV, 8 were positive for JCV, 2 were positive for both BKV and JCV, and 20 were negative for both viruses. JCV and BKV status was unknown for the remaining 100 samples. As described previously, samples were analyzed by PCR without preceding DNA extraction.
(iv) Serum of HSCT recipients. Thirty-three serum samples drawn from 17 HSCT recipients 2 to 6 weeks after transplantation were studied. Total nucleic acids were extracted from 200 μl of serum by QIAamp Virus Spin Kit (QIAGEN) and eluted in 50 μl.
(v) Whole blood. Whole EDTA blood from 192 healthy volunteer blood donors in Stockholm was analyzed. DNA was extracted from 200-μl samples with the MagAttract DNA Mini M48 Kit and the Bio robot M48 instrument (QIAGEN) and eluted in 50 μl.
(vi) Leukocytes. Ninety-six frozen preparations of Ficoll-separated leukocytes were studied. Samples were originally sent to the laboratory for diagnosis of cytomegalovirus by PCR and virus culture and therefore mainly originated from immunosuppressed patients. DNA was extracted from 105 cells with the MagAttract DNA Mini M48 Kit and the Bio robot M48 instrument (QIAGEN) and eluted in 100 μl.
Nucleotide sequence accession numbers.
The sequences reported in this paper have been deposited in the GenBank database under accession no. EF127906 (KIPyV isolate 60), EF127907 (KIPyV isolate 350), and EF127908 (KIPyV isolate 380).
Molecular virus screening of 20 respiratory tract samples.
A virus-enriched DNA-cDNA library was constructed from 20 randomly selected nasopharyngeal aspirate samples by a previously published protocol. After vector and low-quality sequences were automatically discarded, sequence reads from 374 (97%) clones remained for database searches. By automated nucleotide and translated BLAST searches, the sequences were categorized as likely (expected value, <10−4) human (73%), bacterial (5%), phage (1%), unknown (2%), and virus (20%) sequences. Sixty-nine of the 74 clones with viral sequences matched human rhinovirus or enterovirus species. Reliable discrimination of rhinovirus from enterovirus sequences or type determination could not be performed on the basis of the unassembled sequence reads. Five clones closely matched the respiratory syncytial virus. In addition to these virus-like clones, a single clone of 363 bp showed weak amino acid similarity (30% identity, expected value = 0.011) to VP1 of SV40 and was selected for further studies.
Genome analysis of KIPyV.
The source nasopharyngeal aspirate sample containing the SV40-like sequence was identified by PCR analysis of aliquots saved before pooling. The positive sample was named Stockholm 60. A second PCR product reaching around the circular DNA genome was used as a template for determining the complete consensus viral genome sequence. The genome was confirmed to be circular and 5,040 nucleotides in length (accession number EF127906). Two additional isolates that were identified during the subsequent prevalence study (see below) were sequenced by the same approach. (Stockholm 350, accession number EF127907; Stockholm 380, accession number EF127908). The three genomes were highly similar. Both isolates Stockholm 350 and Stockholm 380 differed from the prototype isolate by 10 nucleotide substitutions, and they differed from each other by seven single bases. The variable positions showed some clustering in the regulatory region, but there were also a few isolate-specific amino acid substitutions in the putative proteins.
Overall genome organization.
The genomic organization of KIPyV is typical for a member of the family Polyomaviridae, with an early region encoding regulatory proteins (small t [ST] and large T [LT] antigens) and a late region coding for structural proteins separated by a noncoding regulatory region. The genome size is within the range of polyomaviruses. Properties of the deduced proteins and their similarities to those of JCV, BKV, and SV40 are shown in Table. While the nonstructural proteins have substantial amino acid sequence similarity to those of the other primate polyomaviruses, the structural proteins have a very low degree of similarity to those of other known polyomaviruses.