Project Files
Each project has its own set of files. Project Files is a placeholder for all your data, analysis, notes, and other materials. You can organize your files into categorical or hierarchical groups like datasets or studies.
Supported files
Carefully choosing file names that are informative and useful for both humans and machines is a simple step towards reproducible research and help others find your materials.
Genotype file
The DDB Platform supports Variant Call Format (VCF) files saved in .vcf.gz format. A VCF file a header and a body.
Header
Mandatory lines
The #CHROM line contains 8 fixed columns, followed by the sample names.
Metadata lines
Meta-data is included after the double hash (##) string, and is usually followed by INFO, FILTER or FORMAT. These lines are optional.
Body
The body contains the data lines, each containing information about a position in the genome.
Official VCF file documentation
The complete VCF file specifications can be found here.
Phenotype file
Phenotype files contain quantitative or categorical information about a sample's phenotypic traits, such as plant height, flowering time, pericarp color, etc. This file is used in conjunction with genotype files for GS and GWAS analyses. Phenotype files should be saved as CSV files, encoded using the UTF-8 standard (CSV UTF-8).
Crossing table file
The crossing table contains individuals (parents - maternal and paternal) to be crossed in Crossing Simulation. This file requires a header with 2 columns (ind1, ind2). Rows after the header are the parent crosses indicated by sample names.
- ind1: maternal parent sample name (e.g. NA0003)
- ind2: paternal parent sample name (e.g. NA0008)
An example cross between sample NA0003 and NA0008
Linkage map file
The linkage map file contains the relative location of genetic markers across the genome. This file can be used to identify the location of genes that are responsible for plant traits and diseases. This requires a header with 4 columns:
- chr: The chromosome number
- physPos: The physical position of a genetic marker.
- SNPid: The genetic marker's SNP ID
- linkMapPos: The linkage map position calculated by dividing physPos by 1,000,000 centiMorgan (cM), by default
Linkage map position (linkMapPos) is usually the quotient of physPos and 1,000,000 cM (878 / 1000000 = 0.000878)
Pedigree file
A pedigree file contains an individual's parent information. It contains a header (first row) and a body, each containing information about an individual sample.
Header
Header specifications
Field | Description | Data type | Encoding | Required | Example |
---|---|---|---|---|---|
acquisitionDate | Date the material was acquired by the organization | Date (YYYY/MM/DD) | UTF-8 | No | 2022/06/21 |
biologicalStatusOfAccessionCode | 3-digit numerical code that represents the genetic nature of the sample | Integer | UTF-8 | No | 410 |
collection | A specific panel/collection/population name the sample belongs to | String | UTF-8 | No | F1_Hybrid |
countryOfOriginCode | 3-letter ISO 3166-1 code of the country in which the sample was originally collected | String | UTF-8 | No | JPN |
germplasmName | Name of the sample | String | UTF-8 | Yes | Basmati 217 |
maternalName | Germplasm name of the sample’s maternal parent | String | UTF-8 | No | Sample14 |
paternalName | Germplasm name of the sample’s paternal parent | String | UTF-8 | No | Sample897 |
synonyms | Alternative names or IDs used to reference the sample. Each synonym is separated by a comma (,) | String | UTF-8 | No | NSFTV14 |
remarks | Additional notes about the sample | String | UTF-8 | No | Additional info |
Body
Each data line contains information about an individual sample. There are 9 fixed fields per data line. All lines are comma-delimited. Missing values can be left blank/empty. The fixed fields are as follows:
To avoid errors when uploading:
- Avoid commas (,) and tabs (/t) in the actual values
- Duplicate germplasmNames are not allowed
Upload a file or create a folder.
What you need:
- You need to be part of a project
Steps:
Inside your project, on the left sidebar, click Project Files.
You will be redirected to the Project Files page. Click the button to upload a file or create a new folder
Click the button to upload a file or create a new folder
Create a new folder of upload a file
Actual genotype and phenotype files uploaded as examples. Uploaded files should have a SUCCESS status.