

the value in the first column of the CDT file. If the left child is not an internal node but a gene from the CDT file, the value should be the gene identifer, i.e. The NODEID of the left child of this node. The value in this column serves as the identifier for the node. The meaning of these headers, and others, are described in the section called “ Tree File Headers ”. All of the rows will be treated as defining nodes, and the headers will be assigned the headers "NODEID", "LEFT", "RIGHT" and "CORRELATION". Tree files with any other string in the first row of the first column will be treated as legacy tree files. All generalized GTR/ATR files must have NODEID as the name of the first column. Generalized tree files have a header line identifying the different columns. This fourth column is used by Java TreeView to determine the height of the node when rendering a tree.īy analogy to the CDT file, the tree files have been generalized in Java Treeview. For each row, the first column is the identifier of the node, the second column is the left child of the node, the third column is the right child, and the fourth column is the correlation between the left and right child. Each row represents a node in either a gene tree, for the GTR file, or an array tree in the ATR file.

Traditionally, tree files have no header, and consist of four columns. However, any other generalized CDT file with the correct columns can serve as a coordinates file. The coordinates files supplied with Java TreeView do not contain any expression data they consist entirely of the unique id column, the chromosome, arm and position columns, and the required GWEIGHT column. There is really no restriction on the units for position bp or kb are natural choices.Ī coordinates file is simply a generalized CDT file which has such columns. "CHROMOSOME" should be a natural number indicating which chromosome the unique ID is on, "ARM" should be either "R" or "L" indicating the arm, and "POSITION" should be a real number indicating how far from the centromere the unique ID is. To this end, it looks for annotation columns with the names "CHROMOSOME", "ARM" and "POSITION", which designate the chromosome, arm and position of a particular gene. In order for Karyoscope to correctly display gene expression data by chromosome location, it needs to know where exactly to position each unique ID. These special columns are described after the basic file format, and should be avoided as annotation names unless you want that specific behavior. There are annotation column names with special meaning to Java TreeView, and are used for coordinates or to set the color of gene names. The NAME column is displayed as per-gene annotation in the dendrogram and other views. The unique ID is used for gene list export, and for some matching purposes when necessary. If the first column is anything other than GID, the first and second columns are assume to be the unique ID and NAME columns.

If the first column is GID, the second and third are assumed to be the unique ID and NAME columns. In addition, Java TreeView does special things with the first two or three columns. As a general practice, it is a good idea to include the GWEIGHT column and EWEIGHT row. Similarly, if the EWEIGHT row is missing Java TreeView assumes the data starts on the second row. For backwards compatibility, if the GWEIGHT column is missing Java TreeView assumes the data starts on the third column, or the fourth column if the first column has the header GID. In addition to expression data, this file can contain additional per-gene and per-array annotation in columns before the GWEIGHT column or in rows before the EWEIGHT row. The generalized CDT file is a straightforward generalization of the CDT and PCL file formats.
