This data adapter can read XML files that contain
any numeric or symbolic data entries of one protein set.
Additionally to these annotations it is possible define the protein set
independently how many annotations are available in the XML file. The protein
set definition can consist of a raw listing of the protein identifier or can
contain the sequence information too. The optional protein set definition must
be in property of the type setdef.
With this input feature you can import any kind of information without considering the semantic of the data.
This input objects can then further analyzed with the GenericXML compare engine.
A DTD and XSD Definition file of the format of the GenericXML
files can be found in the /res/generic_xml_defs folder of this
installation or alternativly within the jar file.
Basically a Generic XML input files look like:
<dataset label="Escherichia_coli_k12">
<property type="setdef" id="setdef">
<input id="gi_123"
value="ACCCVMAD" /> OR just <input id="gi_123"/>
...
</property>
<property type="numeric" id="orf.length">
<input
id="gi_123" value="66" />
<input
id="gi_234" value="2463" />
...
</property>
<property type="symbolic" id="funcat.fun_num">
<input
id="gi_123" value="01.01.01" />
<input
id="gi_234" value="01.01.04" />
...
</dataset>
Symbolic properties just describe any kind of Strings. Numeric properties
just describe any numeric values.
If a protein has multiple annotation features you can either build multiple
input nodes, but it is recommended to delimiter the annotations with a
semicolon. For example the protein that has multiple functional annotations may
be noted like:
<input id="xxx"
value="membrane;isomerase;chaperon"/>