Skip to content

Commit

Permalink
Add @SQ-TP molecule topology field (PR samtools#405 part 1)
Browse files Browse the repository at this point in the history
This is to support annotating reference sequences as circular,
e.g., for bacterial organisms or the human mitochondrial chromosome.
[Summarise @nh13's footnote text so it fits on one line, so `@RG-SM`
is not pushed off to the next page as an orphan. Remove now unneeded
pagebreak hint.] Fixes samtools#403.
  • Loading branch information
nh13 authored and jmarshall committed May 2, 2019
1 parent 45c9feb commit 2a12b7e
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions SAMv1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ \subsection{The header section}
& {\tt AN} & Alternative reference sequence names.
A comma-separated list of alternative names that tools may use when referring
to this reference sequence.%
\footnote{For example, given `{\tt @SQ SN:MT AN:chrMT,M,chrM LN:16569}',
\footnote{For example, given `{\tt @SQ SN:MT AN:chrMT,M,chrM LN:16569 TP:circular}',
tools can ensure that a user's request for any of `MT', `chrMT', `M',
or~`chrM' succeeds and refers to the same sequence.}
These alternative names are not used elsewhere within the SAM file;
Expand All @@ -287,6 +287,9 @@ \subsection{The header section}
& {\tt DS} & Description. UTF-8 encoding may be used.\\\cline{2-3}
& {\tt M5} & MD5 checksum of the sequence. See Section~\ref{sec:ref-md5}\\\cline{2-3}
& {\tt SP} & Species.\\\cline{2-3}
& {\tt TP} & Molecule topology. \emph{Valid values}: {\tt linear} (default) and {\tt circular}.%
\footnote{The previous footnote's example identifies MT as a circular chromosome.
The {\tt TP} field is often omitted, which implies linear.}\\\cline{2-3}
& {\tt UR} & URI of the sequence. This value may start with one of the standard
protocols, e.g http: or ftp:. If it does not start with one of these protocols, it is assumed to be a file-system path.\\\cline{1-3}
\multicolumn{2}{|l}{\tt @RG} & Read group. Unordered multiple {\tt @RG} lines are allowed.\\\cline{2-3}
Expand All @@ -310,7 +313,6 @@ \subsection{The header section}
platform/technology used.\\\cline{2-3}
& {\tt PU} & Platform unit (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.\\\cline{2-3}
& {\tt SM} & Sample. Use pool name where a pool is being sequenced.\\\cline{1-3}
\pagebreak[4]
\multicolumn{2}{|l}{\tt @PG} & Program. \\\cline{2-3}
& {\tt ID}* & Program record identifier. Each {\tt @PG} line must have a unique {\tt ID}.
The value of {\tt ID} is used in the alignment {\tt PG} tag and {\tt PP} tags of other {\tt @PG} lines.
Expand Down Expand Up @@ -1314,6 +1316,7 @@ \section{SAM Version History}\label{sec:history}
\subsection*{1.6: 28 November 2017 to current}
\begin{itemize}
\item Add {\tt @SQ TP} circular/linear topology header tag. (May 2019)
\item\textbf{Restricted the allowable punctuation characters in reference sequence names} (in {\tt @SQ SN}, {\sf RNAME}, etc).
The sets of characters allowed in {\tt @SQ SN} and {\tt @SQ AN} are now identical, which enlarges the previous {\tt AN} set. (Jan 2019)
Expand Down

0 comments on commit 2a12b7e

Please sign in to comment.