Help for 'connect' program.

Date help created:  31 Oct 1996
Date last updated:  22 Jun 2001
'connect' takes a shift file and a crosspeak file and matches the crosspeaks to one or more pairs of shifts.

To run the program type

	connect <connect script file>

The program is intended to be used in conjunction with XPLOR, Per Kraulis' Ansig, and rdb scripts written by Andy Raine.

There must be no more than one key word per line in the script file.

Below <...> represents an argument for a key word and [...] represents a key word or argument that is optional.

The syntax for the key words are

	input_par <par file of spectrum>
	input_shift <input shift file>
	input_crosspeak <input crosspeak file>
	[ output_crosspeak <output crosspeak file> ]
	[ output_match <output match file> ]
	[ output_xplor <output XPLOR file> ]
	[ output_nilges <output Nilges-style XPLOR file> ]
	[ output_null <output null matches file> ]
	columns <first column> [ <second column> ]
	intensity_dist <intensity> <distance>
	[ intensity_dist2 <intensity> <distance> <distance_minus> ]
	[ intensity_dist3 <intensity> <distance> <distance_minus> <distance_plus> ]
	[ exclude <column> <spectral width> <tolerance> ]
	[ residues <columns> <residue1> <residue2> ]
	[ spectral_width <column> <spectral width> ]
	[ split_output ]

At least one of output_match, output_xplor or output_nilges must occur. The output_crosspeak file contains a list of crosspeaks that have not been matched.

All shifts are aliased according to the specified spectral width.

A description of the key words may be obtained by typing

	connect help <key word>

A description of the format of the input shift file may be obtained by typing

	connect help shift_format

A description of the format of the input and output crosspeak file may be obtained by typing

	connect help crosspeak_format

A description of the format of the output match file may be obtained by typing

	connect help match_format

A description of the format of the output XPLOR file may be obtained by typing

	connect help xplor_format

A description of the format of the output Nilges-style XPLOR file may be obtained by typing

	connect help nilges_format

shift_format

The input shift file for the program has an ascii tab-separated format, with two header lines followed by one line (record; row) per shift data. The first header line contains the column titles. The second header line contains an 'N' or an 'S' in each column, consistent with rdb format.

Each record has data for the light atom (hydrogen) and the corresponding bonded heavy atom (anything other than hydrogen, e.g. carbon, nitrogen, oxygen or sulfur).

The first column contains the residue name of the amino acid, the second column contains the residue number of the amino acid, the third column contains the light atom name, the fourth column contains the light atom shift (in ppm), the fifth column contains the heavy atom atom name, the sixth column contains the heavy atom shift (in ppm), the seventh column contains the light atom tolerance (in ppm), and the eighth column contains the heavy atom tolerance (in ppm).

A shift of <= -99 is considered to be unknown.

The tolerances specify how close a crosspeak shift value must be to the specified atom shift in order for there to be a match.

A given atom is allowed to have more than one entry in the file. If so, they must be consecutive rows and if for a given peak more than one of these entries matches then the atom is only output once but the match counts reported include all entries matched.

crosspeak_format

The input and output crosspeak files for the program have an ascii tab-separated format, with two header lines followed by one line (record; row) per crosspeak. The first header line contains the column titles. The second header line contains an 'N' or an 'S' in each column, consistent with rdb format.

The records first have a set of data for each dimension, and then a dimension-independent set.

For each dimension (of the spectrum) there are five columns. The first column contains the residue name, the second column contains the residue number, the third column contains the atom name, the fourth column contains the atom type, and the fifth column contains the shift (in ppm). The first four of these columns can be null, but if not null (the residue and atom names are checked) this will be considered to be a valid assignment. The dimensions are ordered with the Ansig convention, which is opposite the Azara convention.

The dimension-independent set has four columns. The first column contains the unnormalized crosspeak intensity, the second column contains the spectrum name, the third column contains the crosspeak number, and the fourth column contains the normalized crosspeak intensity.

The output crosspeak file has two additional columns, giving the number of matches for the two sets of matched shifts.

match_format

The output match file for the program has an ascii tab-separated format, with two header lines followed by one line (record; row) per shift data. The first header line contains the column titles. The second header line contains an 'N' or an 'S' in each column, consistent with rdb format.

Each record has data for the two matched light atoms.

The first column contains the residue number of the first atom, the second column contains the residue name of the first atom, the third column contains the atom name of the first atom, the fourth column contains the residue number of the second atom, the fifth column contains the residue name of the second atom, the sixth column contains the atom name of the second atom, the seventh column contains the normalised intensity of the matched crosspeak, the eight column contains the crosspeak number of the matched crosspeak, and the ninth column contains an estimate of the implied distance between the light atoms.

xplor_format

The output XPLOR file for the program has a proprietary ascii format. See an XPLOR manual for more explanation.

nilges_format

The output Nilges-style XPLOR file for the program is a slight modification of the xplor_format.

input_par

input_par <par file of spectrum>

	This specifies the par file name of the spectrum from
	which the crosspeaks were derived.  The data file of
	the spectrum is not used.  This should be the first
	key word in the script file.

input_shift

input_shift <input shift file>

	This specifies the input shift file.  A description of
	the format may be obtained by typing

		connect help shift_format

input_crosspeak

input_crosspeak <input crosspeak file>

	This specifies the input crosspeak file.  A description
	of the format may be obtained by typing

		connect help crosspeak_format

output_crosspeak

output_crosspeak <output crosspeak file>

	This specifies the output crosspeak file.  This file
	contains those crosspeaks that have not been matched.
	A description of the format may be obtained by typing

		connect help crosspeak_format

output_match

[ output_match <output match file> ]

	This specifies the output match file.  In content this
	file is equivalent to the output_xplor file and
	output_nilges file, and at least one of these three
	key words must appear.  A description of the format may
	be obtained by typing

		connect help match_format

output_xplor

[ output_xplor <output XPLOR file> ]

	This specifies the output XPLOR file.  In content this
	file is equivalent to the output_match file and
	output_nilges file, and at least one of these three
	key words must appear.  A description of the format may
	be obtained by typing

		connect help xplor_format

output_nilges

[ output_nilges <output Nilges-style XPLOR file> ]

	This specifies the output Nilges-style XPLOR file.  In
	content this file is equivalent to the output_match
	file and output_xplor file, and at least one of
	these three key words must appear.  A description of the
	format may be obtained by typing

		connect help nilges_format

output_null

[ output_null <output null matches file> ]

	This specifies the output file for crosspeaks without
	any matches.  The format is tab-separated with one
	header line followed by one line per crosspeak (without
	any matches), with the line containing the crosspeak
	number and spectrum.

columns

columns <first column> [ <second column> ]

	This specifies one or two columns, and the data in the
	corresponding column(s) in the input_crosspeak file are
	matched to the shifts in the input_shift file.  The
	first column must be a light atom (hydrogen) and the second
	column, if it exists, must be the heavy atom to which the
light atom is bonded.
	If the second column is negative the shift is not matched
	but the atom type is (for the column which is the negative
	of the specified value).  The first column must be positive.
	This key word must appear twice.

intensity_dist

intensity_dist <intensity> <distance>

	This is used to specify how to convert the normalised
	intensity in the input_crosspeak file into a distance.
	This key word can appear more than once, and they must be
	listed in order of decreasing <intensity> (increasing
	<distance>).  For a given crosspeak normalised intensity
	the first smaller <intensity> determines the <distance>
	to be used.
	If this key word and the other intensity_dist* key words do
	not appear then it is assumed that distance = intensity
	(this is useful for working with simulated data).
	In xplor terminology, this assumes distance_minus = distance
	and distance_plus = 0.  To set these explicitly use either
	intensity_dist2 or intensity_dist3.

intensity_dist2

intensity_dist2 <intensity> <distance> <distance_minus>

	This is used to specify how to convert the normalised
	intensity in the input_crosspeak file into a distance.
	This key word can appear more than once, and they must be
	listed in order of decreasing <intensity> (increasing
	<distance>).  For a given crosspeak normalised intensity
	the first smaller <intensity> determines the <distance>
	to be used.
	If this key word and the other intensity_dist* key words do
	not appear then it is assumed that distance = intensity
	(this is useful for working with simulated data).
	In xplor terminology, this assumes distance_plus = 0.
	To set this explicitly use intensity_dist3.

intensity_dist3

intensity_dist3 <intensity> <distance> <distance_minus> <distance_plus>

	This is used to specify how to convert the normalised
	intensity in the input_crosspeak file into a distance.
	This key word can appear more than once, and they must be
	listed in order of decreasing <intensity> (increasing
	<distance>).  For a given crosspeak normalised intensity
	the first smaller <intensity> determines the <distance>
	to be used.
	If this key word and the other intensity_dist* key words do
	not appear then it is assumed that distance = intensity
	(this is useful for working with simulated data).

exclude

[ exclude <column> <spectral width> <tolerance> ]

	This specifies that crosspeaks within <tolerance> of the
	<spectral width> for the given <column> are ignored.
	The <spectral width> is specified in ppm (not Hz).

residues

[ residues <columns> <residue1> <residue2> ]

	This specifies that only those shift matches for residues
	between <residue1> and <residue2> for the given <columns>
	(1 or 2) are output.
	The default is that all matches are output.
	This can have multiple occurrences for a given choice of
	<columns> and if so then the shift matches for residues
	which lie in one of the specified residue ranges.

spectral_width

[ spectral_width <column> <spectral width> ]

	This specifies that this is the <spectral width> for the
	given <column>.  This key word should be used if the
	spectral width given in the par file is not correct for
	the aliasing.
	The <spectral width> is specified in ppm (not Hz).

split_output

[ split_output ]

	This specifies that for output_xplor and output_nilges
	there should be two output files, one (suffix '0') for
	unassigned output and one (suffix '1') for assigned
	output.

Azara help: connect / W. Boucher / azara@bioc.cam.ac.uk