+ reads_prime.seq + reads_closure.seq The sequences produced by the random (whole genome shotgun) phase and the closure (finishing) phase of the sequencing project. These files are in multi-FastA format, with whitespace delimited sequence information placed in the FastA headers. The 6 fields in the header are: ID, MINL, MAXL, MEANL, CLEARL, CLEARR ID - A unique sequence identifier MINL - Estimated minimum insert size MAXL - Estimated maximum insert size MEANL - Estimated mean insert size CLEARL - The leftmost position of the trimmed sequence. We have already trimmed all sequences to remove vector and low-quality basecalls. The sequence files contain the entire read; to get the trimmed data, use the range from CLEARL through CLEARR. CLEARL and CLEARR are inclusive range bounds, and use a 1 based coordinate system. CLEARR - The rightmost position of the trimmed sequence. + reads_prime.xml + reads_closure.xml The ancillary information in trace archive XML format. For each sequencing read, there is a record which describes the following fields: - A unique sequence identifier. Same as the ID field in the seq files. - The insert ID this read was sequenced from. Reads from the same insert can be grouped to form mate-pair information for the assembly process. - Direction of the sequencing reaction. Useful in determining the orientation of the mate sequences. - The library ID this insert was taken from. Reads from the same library will share the same size distribution. - Estimated insert size from this insert. - Standard deviation of the estimated insert size. - Type of read, either "closure" or "paired_production" Meaning the read is a closure walk or an end-paired sequence. +reads_prime.qual +reads_closure.qual The quality values for the each of the above sequences files, in two digit integer format, separated by a single whitespace. Each quality sequence is headed by the same FastA ID found in the seq files.