Menu:

Arpy's Scripts

Perl Scripts Used For Extracting and Manipulating Scaffolds

1. parse_scaffolds and parse_scaffolds_input
sample input: parse_scaffolds_sample_in.txt, parse_scaffolds_input_sample_in.txt (only necessary for parse_scaffolds_input)
sample output: parse_scaffolds_input_sample_out.txt

parse_scaffolds and parse_scaffolds_input are two scripts used for extracting scaffolds of interest from the Mimulus 5x or 7x genome builds. The only difference between the scripts is how the input is specified. With parse_scaffolds, the user edits the scaffolds they want extracted in the script itself (using the @scaffolds_of_interest array and following the format shown). With parse_scaffolds_input, the scaffolds are specified in a separate text file, which is helpful for extracting larger numbers of scaffolds. The format of this input file is each scaffold name on a different line, as shown in the sample input file "parse_scaffolds_input_sample.txt". The user can change the name of the output file to reflect the scaffolds extracted by changing the name of the $outputfile variable at the end of the scripts.

2. scaffold_chopper
sample input: scaffold_chopper_sample_input.txt
sample output: scaffold_chopper_sample_output.txt

scaffold_chopper is a script used for breaking up a scaffold of interest or any sequence file in FASTA into sections of a user defined length. The length of the chopped sections in base pairs is defined in the $subset_length variable at the beginning of the script. The user also specifies the name of the file containing the scaffold to be chopped (which should be in fasta format) and the name of the output file. Once a scaffold has been chopped into sections, each section is systematically relabeled in the output file with a number after the original name of the scaffold. The numbers assigned go in order, so the first chopped section is labelled with a "_1" the second with "_2" and so on. Importantly, any sections containing unspecified nucleotides signified with an "N" are excluded.

3. scaffold_translator
sample input: scaffold_translator_sample_input.txt
sample output: scaffold_translator_sample_output.txt

scaffold_translator is a script used for translating scaffolds or any sequence in FASTA form into all 6 reading frames. The input is scaffold or scaffolds in FASTA format (users can use the output of the parse_scaffolds script) and the output consists of the translations in FASTA format. Each translation is labelled with the name of the original FASTA file with the reading frame appended at the end.