Title:
NAFL - NUCLEOTIDE ASSEMBLY FUN(CTIONAL) LANGUAGE
Author:
Daniel B. Kolis
Correspondence:
dank...@gmail.com Abstract:
The creation of a computer-script language named Nucleotide Assembly Functional Lan-
guage (NAFL) attempts to achieve two objectives: Firstly, improve the scope of FASTA
searches by making them more generalized and include non-Central Dogma atomic
elements correctly. Secondly, host the modeling of many life science programmes in
new Synthetic Biology with less or no custom programming.
The amount of sequence, sequence-like and ancillary information when working through
a specific life science problem is immense. NAFL supports metadata that follows com-
puter queries and modeling as work proceeds automatically. This is particularly relevant
in the frequent case where work sessions span multiple sittings and intermediate re-
sults are shared over time and space with multiple workers. FASTA and sequences of
nucleotide and gene-like sequences are the most prominent organizational elements
in the code's Integrated Design Environment around NAFL. Improving the interac-
tion of searches executed for DNA, RNA and proteins against databases has direct
utility; but this new nomenclature in machine readable form also avoids changing
programs to understand returned sequence matches. The computer dialog focuses on
the user experience and avoids repeating similar steps with slightly different criteria
by representing lists of project components as opposed to working through one at a
time sequentially. The subsidiary tasks for many tasks like protein folding and mRNA
planning is supported with similar syntax human-machine dialog.
NAFL is constructed from first principles to be uniform across computing platforms;
Enabling the execution of stored scripts on personal, cloud and cluster configurations
interchangeably in any common operating system. A goal is to move the user experience
into direct real-time encounters with problems and solutions. In contrast, the usual
creation of endless database flat files and maintenance of notes is seen as an old world
practice. Instead, NAFL substitutes interactivity and automates much note-taking. Since
sizable delays do occur, NAFL specifically implements asynchronous human-machine
interactions; Instead of waiting many works-in-process results are returned out of order
as partial solutions evolve.
Test example of executable NAFL for a tweak for COVID infection vaccination trial(s):
% Sneaks past the immune system (on the way in), a lot better
Ψ = BaseExpansion[ U ]( { InChI=1S/C9H12N2O6/c12-2-4-5(13)6(14)7(17-4)3-1-10-9
(16)11-8(3)15/h1,4-7,12-14H,2H2,(H2,10,11,15,16 } )
% Let's try it plain and fancy just for fun
UtrL5PfBland = To_Rna( { GAGAATACTAGTATTCTTCTGGTCCCCACAGACTCAGAGAGAACCCGCCACC } )
UtrL5PfFancyΨ = Substitute_Base( U, Ψ , UtrL5PfBland )
UtrL5Allsω = Expand_Any( UtrL5PfBland, UtrL5PfFancyΨ ) ; Both get tried now
JustStart = { AUG } ; Don't forget this protein, man Methionine you know
FunnyStart = Substitute_Base( U, Ψ , JustStart ) ; Maybe use someday
% Similar try, so we get two of these as well
SignalPeptidePf = To_Rna( { TTCGTGTTCCTGGTGCTGCTGCCTCTGGTGTCCAGCCAGTGTGTG } )
SignalPeptideΨ = SubstituteBase( U, Ψ , SignalPeptidePf )
SignalAlls = Expand_Any( SignalPeptidePf, SignalPeptideΨ )
% The so called spike that let's this virion get in and do its bad stuff
SpikeRegionPf = To_Rna( { AACCTGACCACCAGAACACAGCTGCTCCAGCCTACACCAACAGCTTTACCA
GAGGCGTGTACTACCCCGACAAGGTGTTCAGATCCAGCGTGCTGCACTCTA
-- sixty five lines ommited --
GACCTGGGCGATATCAGCGGAATCAATGCCAGCGTCGGAACATCCAGAG
AGATCGACCGGCTGAACGAGGTGGCCAAGAATCTGAACGAGAGCCTGATCG
ACCTGCAAGAACTGGGGAAGTACGAGCAGTACATCAAGTGGCCCTGGTACA
TCTGGCTGGGCTTATCGCCGGACTGATTGCCATCGTGATGGTCACAATCAT
GCTGTGTTGCATGACCAGCTGCTGTAGCTGCCTGAAGGGCTGTTGTAGCTG
TGGCAGCTGCTGCAAGTTCGACGAGGACGATTCTGACCCGTGCTGAAGGGC
GTGACTGCACTACACA } )
DoubleStopPf = To_Rna( { TGATGA } )
UtrL3Pf = To_Rna( { CTCGAGCTGGTACTGCATGCACGCAATGCTAGCTGCCCCTTTCCCGTCCTGGGTACCCCG
AGTCTCCCCCGACCTCGGGTCCCAGGTAGCTCCCACCTCCACCTGCCCCACTCACCACCT
CTGCTAGTTCCAGACACCTCCCAAGCACGCAGCAATGCAGCTCAACGCTTAGCCTAGC
CACACCCCCACGGGACAGCAGTGATTAACCTTTGCAATACGAGTTTAACTAAGC
TATACTAACCCCAGGGTTGGTCAATTTCGTGCCAGCCACACCCTGGAGCTAGC } )
% A little pause for expression in the A tail might be good, so
Stumbler1 = {GAG};;PolyAtail = { Poly_A_Num( 30 ) Stumbler1 Poly_A_Num( 20 ) }
ShowLocal*[ mydebugger ]( Stumbler1 ) % Be sure its works like I think it works
% Assemble whatever is represented four kinds to start
RunForFour = { UtrL5Allsω JustStart SignalAlls SpikeRegionPf DoubleStopPf PolyAtail }
% Use our New Tesla V3 with Curevac protein engine
Make[nods cap5 mass='100E-6' teslav3 welllabels='a1 a2 b1 b2' QR temp='200K'
joblabel="Francis C" timestamp]( RunForFour )
run!
References:
[1] F. Crick, "The central dogma of molecular biology," Nature, vol. 227, pp. 561–563, Aug 1970. [Online].
Available:
https://cs.brynmawr.edu/Courses/cs380/fall2012/CrickCentralDogma1970.pdf [2] R. Bijoyita. Effects of mrna modifications on translation: An overview. [Online]. Available:
https://link.springer.com/protocol/10.1007/978-1-0716-1374-0_20 [3] J. Rees-Garbutt. Garbutt: Furthering genome design using models and algorithms. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S2452310020300494 [4] J. Lanier, You are not a Gadget: A Manifesto. London: Penguin Random House. 2011.
[5] M. Hanwell, "Avogadro: an advanced semantic chemical editor, visualization, and analysis platform," Journal
of Cheminformatics, 2012.
https://jcheminf.biomedcentral.com/track/pdf/10.1186/1758-2946-4-17.pdf [6] W. Zheng, "I-TASSER Gateway: A Protein Structure and Function Prediction Server Powered by XSEDE.",
Future Generation Computer Systems, 2019.
https://zhanggroup.org/papers/2019_5.pdf [7] R. Chowdhury, "Single-sequence protein structure prediction using language models from deep learning.",
bioRxiv, 2021.
https://www.biorxiv.org/content/10.1101/2021.08.02.454840v1.full.pdf [8] Z. Thornburg, "Fundamental behaviors emerge from simulations of a living minimal cell." Cell, 2021.
https://www.cell.com/action/showPdf?pii=S0092-8674%2821%2901488-4 [9] J. McLaughlin, "The Synthetic Biology Open Language (SBOL) Version 3: Simplified Data
Exchange for Bioengineering." frontiers in Bioengineering and Biotechnology, 2020. https:
//www.ncbi.nlm.n
Revision Control
Dates and times are reasonably precise but are intended to define document releases, not mere details of text processing.
Time Zone is UTC unless noted otherwise. Orig mtl is LaTeX rendered generally as a PDF.
Filename; Date and time; Distribution Version number
Nafl2358; 27 Jan 2022 23:58; 0.9.1
Nafl2020; 02 Feb 2022 23:20; 0.9.2
Nafl2255; 17 Feb 2022 15:50; 0.9.3
This summary: FL: SummNaflQ12025A
End of Text for casual submission to GC Blog 05 Mar 2025 10:00 EST
HASH of MSG, findable is:
https://groups.google.com/g/diybio/c/2xdo8PPZEs4Document end