E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

Related tags

Deep LearningE2EDNA2
Overview

Documentation

E2EDNA 2.0 - OpenMM Implementation of E2EDNA !

An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides.

Michael Kilgour, Tao Liu, Ilya S. Dementyev, Lena Simine

mjakilgour gmail com

For original version of E2EDNA: J. Chem. Inf. Model. 2021, 61, 9, 4139–4144 (https://doi.org/10.1021/acs.jcim.1c00696) https://github.com/InfluenceFunctional/E2EDNA

Installation

This installation path has been tested on macOS and it relies on conda and pip package managers.

  1. Download the E2EDNA 2.0 package from this repository.
  2. Register and download NUPACK from http://www.nupack.org/downloads, you will need the path to ~/nupack-###/source/package directory
  3. In the E2EDNA2 directory please modify the macos_installation.sh script: update the path to nupack (see step 2)
  4. From E2EDNA2 folder run macos_installation.sh.
    • Caveat: in case conda activate e2edna command gives an error or if after the script finishes e2edna enviroment has not been activated, please either replace the activation command with with source activate /path-to-env/e2edna
    • OR alternatively copy paste commands from the script without modifications to command line and run one by one, this will go around the unconfigured shell issue.
  5. Register and download MMB from https://simtk.org/projects/rnatoolbox . Place the Installer### folder into the e2edna folder. NB: Do not specify DYLD_LIBRARY_PATH against the recommendations of the MMB installation guide. This is to avoid interference with the OpenMM module.
  6. Update 3 paths in main.py:
 params['workdir'] = '/path-to-e2edna/localruns'                         # working directory   
       
 params['mmb dir'] = '/path-to-e2edna/e2edna/Installer.###/lib'          # path to MMB dylib files
      
 params['mmb']     = '/path-to-e2edna/Installer.###/bin/MMB-executable'  # path to MMB executable    

Running a job

Quickstart

  • Set 'params' in main.py, as indicated in "Installation".
  • Run the bash script automate_tests.sh to test all 8 modes automatically.
  • Alternatively, a single run can be carried out by run_num, mode, aptamer sequence, and ligand's structural file. For example,
python main.py --run_num=1 --mode='free aptamer' --aptamerSeq='TAATGTTAATTG' --ligand='False' --ligandType='' --ligandSeq=''
python main.py --run_num=2 --mode='full dock' --aptamerSeq='TAATGTTAATTG' --ligand='YQTQ.pdb' --ligandType='peptide' --ligandSeq='YQTQTNSPRRAR'
    
# --ligand='False'        # if no ligand. --ligandType and --ligandSeq will be ignored.
# --ligandType='peptide'  # or 'DNA' or 'RNA' or 'other'. Assuming 'other' ligand can be described by Amber14 force field.
# --ligandSeq=''          # if no sequence. For instance, when ligandType is 'other'

Functionality: Eight different modes of operation

E2EDNA 2.0 takes in a DNA aptamer sequence in FASTA format, and optionally a short peptide or other small molecule, and returns details of the aptamer structure and binding behaviour. This code implements several distinct analysis modes so users may customize the level of computational cost and accuracy.

  • 2d structure → returns NUPACK or seqfold analysis of aptamer secondary structure. Very fast, O(<1s). If using NUPACK, includes probability of observing a certain fold and of suboptimal folds within kT of the minimum.
  • 3d coarse → returns MMB fold of the best secondary structure. Fast O(5-30 mins). Results in a strained 3D structure which obeys base pairing rules and certain stacking interactions.
  • 3d smooth → identical to '3d coarse', with a short MD relaxation in solvent. ~Less than double the cost of '3d coarse' depending on relaxation time.
  • coarse dock → uses the 3D structure from '3d coarse' as the initial condition for a LightDock simulation, and returns best docking configurations and scores. Depending on docking parameters, adds O(5-30mins) to '3d coarse'.
  • smooth dock → identical to 'coarse dock', instead using the relaxed structure from '3d smooth'. Similar cost.
  • free aptamer → fold the aptamer in MMB and run extended MD sampling to identify a representative, equilibrated 2D and 3D structure. Slow O(hours).
  • full dock → Return best docking configurations and scores from a LightDock run using the fully-equilibrated aptamer structure 'free aptamer'. Similar cost (LightDock is relatively cheap)
  • full binding → Same steps as 'full docking', with follow-up extended MD simulation of the best binding configuration. Slowest O(hours).

Test run: inputs and outcomes

Running this script automate_tests.sh will automatically run simple very light simulations of all 8 modes. Here we explain what outputs to look for and what success looks like.

  • Mode 1:2d structure Input: fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation: observe the dot-parenthesis representation for 2d structure, e.g., ..(...)..

  • Mode 2:3d coarse

Input: ‘3d unrefined’, fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation: Visualize foldedAptamer_0.pdb in VMD or PyMOL

  • Mode 3:3d coarse

Input: fasta sequence, e.g, CGCGCGCGCGCGC

Outputs:

Success evaluation:

  • Mode 4:coarse dock

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’

Outputs:

Success evaluation:

  • Mode 5:smooth dock Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’

Outputs:

Success evaluation:

  • Mode 6: free aptamer Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Last step is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. (if we ask contact predictor for >1 ssStructure)

Input: fasta sequence, e.g, CGCGCGCGCGCGC

Modifications to the code: set params[‘mode’] = ‘free aptamer’ params['sequence'] =’CGCGCGCGCGCGC’

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb”

MD simulation: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb”

Success evaluation: The DCD trajectory file is generated, and file “log.txt” shows that the MD sampling of free aptamer is 100% complete. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer.

  • Mode 7: full dock Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Next is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. Finally, the representative structure will be docked by a target ligand of interest (its structure must be provided as a PDB file).

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’ Modifications to the code: set params[‘mode’] = ‘full docking’ params['sequence'] =’CGCGCGCGCGCGC’ params[‘target’] = ‘target.pdb’ # need to update the code for this.

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb”

MD simulation: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb” Docking: Aptamer-ligand complex structure: “top_1.pdb”. Docking score is in “record.txt”.

Success evaluation: The DCD trajectory file is generated, and file “log.txt” shows that the MD sampling of free aptamer is 100% complete. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer. Visualize aptamer-ligand complex structure.

  • Mode 8: full binding Given a DNA sequence, its secondary structure will be predicted and represented by a contact map of dot-parenthesis notation. Under the guidance of the predicted secondary structure, the sequence will then be folded into an initial three dimensional conformation. Next is to run a molecular-dynamics simulation to sample its conformational space and find out the representative conformation from the MD trajectory. The representative structure will be docked by a target ligand of interest (its structure must be provided as a PDB file). Finally, the aptamer-ligand complex molecule will be sampled by MD simulation to investigate its dynamics.

Input: fasta sequence, e.g, CGCGCGCGCGCGC; PDB of target ligand, e.g., ‘target.pdb’ Modifications: set params[‘mode’] = ‘full binding’ params['sequence'] =’CGCGCGCGCGCGC’ params[‘target’] = ‘target.pdb’ # need to update the code for this.

Outputs: Secondary structure prediction: such as ((....))....((.(...).)).. in “record.txt” MMB folded structure: “foldedAptamer_0.pdb” MD simulation of free aptamer: Binary trajectory: “clean_foldedAptamer_0_processed_complete_trajectory.dcd” Topology: “clean_foldedAptamer_0_processed.pdb” Representative conformation: “repStructure_0.pdb” Docking: Aptamer-ligand complex structure: “top_1.pdb”. Docking score is in “record.txt”. MD simulation of aptamer-ligand complex: Binary trajectory: “clean_complex_0_0_processed_trajectory.dcd” Topology: “clean_complex_0_0_processed.pdb”

Success evaluation: File “log.txt” shows that the MD sampling of free aptamer is 100% complete and the DCD trajectory file is generated. Visualize MD trajectory of free aptamer using the topology and the binary trajectory file. Visualize representative conformation of the DNA aptamer. Visualize aptamer-ligand complex structure The DCD trajectory file is generated, and file “log_complex.txt” shows that the MD sampling of aptamer-ligand is 100% complete. Visualize MD trajectory of aptamer-ligand using its binary and topolog file. It is worth noting that the aptamer might seem far apart from the target ligand, which could be a result of the periodic boundary condition. Should we correct it or leave user to do it?

MD simulation might stop at the onset with “Particle coordinate is nan” error. It could be due to the energy minimization being too aggressive so tha the coordinate gets out of boundary, then integrator cannot work on those non-sense coordinate values. In this case, re-run the pipeline.

MMB folding could take a while if multiple refolding takes place for any tricky sequence.

__ work in progress__

Physical Parameters

Default force field is AMBER 14. Other AMBER fields and explicit water models are trivial to implement. Implicit water requires moving to building systems from AMBER prmtop files. CHARMM may also be easily implemented, but hasn't been tested. AMOEBA 2013 parameters do not include nucleic acids, and AMOEBABIO18 parameters are not implemented in OpenMM.

* params['force field'] = 'AMBER'
* params['water model'] = 'tip3p'

Default parameters here - for guidance on adjustments start here.

params['box offset'] = 1.0 # nanometers
params['barostat interval'] = 25
params['friction'] = 1.0 # 1/picosecond
params['nonbonded method'] = PME
params['nonbonded cutoff'] = 1.0 # nanometers
params['ewald error tolerance'] = 5e-4
params['constraints'] = HBonds
params['rigid water'] = True
params['constraint tolerance'] = 1e-6
params['pressure'] = 1 

Increasing hydrogen mass e.g., to 4 AMU enables longer time-steps up to ~3-4 fs. See documentation for details.

params['hydrogen mass'] = 1.0 # in amu

Temperature, pH and ionic strength are taken into account for 2D folding in NUPACK, ion concentration in MD simulation, and protonation of molecules for MD (safest near 7-7.4).

params['temperature'] = 310 # Kelvin - used to predict secondary structure and for MD thermostatting
params['ionic strength'] = .163 # mmol - used to predict secondary structure and add ions to simulation box
params['pH'] = 7.4 # simulation will automatically protonate the peptide up to this pH

The peptide backbone constraint constant is the constant used to constrain backbone dihedrals. A minimum of 10000, as it is currently set, is recommended for good constraints (deviations < 5° were always seen with this value). For more info, please read README_CONSTRAINTS.md.

params['peptide backbone constraint constant'] = 10000

Implicit Solvent

params['implicit solvent'] = True
if params['implicit solvent']:
    params['implicit solvent model'] = OBC1  # only meaningful if implicit solvent is True
    params['leap template'] = 'leap_template.in'
    # TODO add more options to params: implicitSolventSaltConc, soluteDielectric, solventDielectric, implicitSolventKappa

Starting with a folded DNA aptamer structure (instead of just a FASTA sequence)

params['skip MMB'] = True  # it will skip '2d analysis' and 'do MMB'
if params['skip MMB'] is True:
    params['folded initial structure'] = 'foldedSequence_0.pdb'  # if wishing to skip MMB, must provide a folded structure
Comments
  • JOSS Review

    JOSS Review

    Hi all,

    Thanks for the invitation to review and congrats on the submission.

    The general idea behind this submission is sound, and follows-up on a 2021 publication from the same authors on E2EDNA v1.0, published in JCIM. From my understanding, the code is essentially a re-write to use OpenMM instead of Tinker as the MD engine. While this is valuable - makes it simpler to install/run - the authors do not realize, in my opinion, this change to its fullest potential. The authors repository is not so much a "package" in the traditional sense, but more of a collection of scripts that automate a certain rigid protocol. I would rather see for instance, NUPACK being an optional dependency - as a user, I could simply provide my own DNA molecules instead of being forced to use NUPACK. In this sense, I think this repository could use more work to stand out on its own compared to last year's publication.

    In addition to this comments, I have a general comment on the repository itself. The authors should take some time to clean up files that are no longer useful for the protocol or that are simply part of the development workflow. Folders named old, or IDE config folders (.idea) should not be part of a published version of the repository, specially when they are even marked to be ignored in the .gitignore file. Same with the existence of both a requirements.txt file and an environment.yml file, whereas only the latter is used. As such, I believe that the authors should spend some time cleaning up the repository and setting up a more "traditional" structure to help potential users navigate through their code base more easily.

    Further, I have a few starter questions about the manuscript, code and, licenses that I think should be clarified. Hopefully these will help the authors improve their work and repository/code.

    Licenses

    • You're licensing the tool under the Apache license but you are including data (parameter sets) that falls under a difference license. In particular, I see the parameter files for the Amoeba forcefield taken from Tinker/OpenMM almost verbatim. Did you check with the appropriate developers if this sharing of the forcefield parameter files is allowed under their license, without any attribution?

    Installation

    • The installation process is quite complex. As a user, I'd have to register and download NUPACK and MMB, as well as edit a series of files in order to get a functional installation. This is simply a suggestion for the developers to keep in mind.

    • Related to the point above, have the authors considered using conda directly to install their software, instead of a custom shell script? pdbfixer is available as a conda package, and you could specify pip packages there too, e.g. lightdock. The installation could be reduced to a simple: 1) install nupack 2) install mmb 3) run conda env create -f e2edna-env.yml.

    • On this last point, the authors should strip the granular version of the env yaml file otherwise conda will struggle with versions on anything but the authors' hardware.

    • According to the README, the code is only tested on MacOS, although I'd imagine the most use would be on a compute cluster running Linux. Have the authors tried running their code on Linux?

    Misc

    • In several sections of their documentation, the authors mention "OpenDNA". Was this the previous name of this package?
    • It would be greatly beneficial for a user to have config files with installation paths, simulation settings etc, instead of having to edit source code. Would the authors be open to this change?

    Comments on the Manuscript

    • In the "Statement of Need", the authors mention an "all-python" package several times. Being pedantic, this is not entirely true as their code relies on quite some compiled code in their dependencies (lightdock, openmm).
    opened by JoaoRodrigues 12
  • Feature Request: Argument parsing

    Feature Request: Argument parsing

    Hello,

    Would you be interested in more fully utilizing command-line argument parsing (e.g. using argparse)? I always feel a bit uncomfortable having to edit source code to use a program. It would be great if you could set the parameters strictly from the CL at runtime, such as workdir, mmb dir, and mmb, instead of editing main.py which is tracked by git.

    Additionally, using argparse would give the opportunity to provide a very helpful user interface. For instance, the user could run: python main.py --help to get a help message explaining what their options are.

    enhancement 
    opened by schackartk 10
  • 7 feature request argument parsing

    7 feature request argument parsing

    Overview

    This pull request implements argparse so that the user is less likely to need to edit source code in main.py. However, more work will need to be done to include parameters related to environmental condotions like ph, etc.

    Other than implementing argparse, functionality is the same. Some things are still a bit awkward because I didn't want to change too much beyond that.

    Affected files

    The following files have changes:

    • main.py: add shebang line, add argument parsing and validation
    • automate_tests.sh: update arguments to align with argparse
    • README.md: describe current functionality and arguments

    Notes

    main.py

    There were a few things that may need to be changed to work most efficiently and predictably.

    The relationship between --ligand, --ligand_type, and --ligand_seq is a bit complex and can probably be improved. Ideally, I think --ligand would be optional, yielding a default of None. This makes more sense than having to use --ligand False. Then --ligand_type, and --ligand_seq could also be optional with a default of None (instead of an empty string). Only when --ligand is present, you validate the others are there and if not parser.error(). I also think the authors should consider if --ligand_seq is truly required if --l;igand is either 'peptide', 'DNA' or 'RNA'. Currently this is enforced (by parser.error()), but if it is actually optional, that should be updated.

    I left the code that uses different params based on whether it is run as local or cluster, but I am not sure if it is necessary. I especially think that the hard-coded paths used when it is cluster should be removed, and turned into arguments. In which case, it is the same as the usual arguments, and may make --device obsolete if there is no difference between local and cluster.

    I implemented wildcards to help the user find their MMB paths (lib and executable) within the --mmb_dir and --mmb. I am hoping the defaults will make it so users don't have to change this argument.

    I removed the operating system argument and instead used platform to detect it. This new implementation has only been tested on my WSL system, so please check this works. One issue is if the result of platform.system().lower() doesn't match an expected value on mac. Initially mine returned Linux, which is why I ran lower() to make it 'linux' which is compatible with the previous implementation.

    Lots of argument validation now happens in get_args(), so hopefully more helpful error messages are produced.

    I added a feature so that both --aptamer and --ligand_seq can be names of files. In that case, the file contents are read in and used as the sequences. Literal strings can still be used instead of file names.

    Readme.md

    I hope my additions are helpful in describing the current functionality.

    One thing I was uncertain is the description of ligand type saying "(default: Amber14)" I didn't see this anywhere that params were set. It is not the default to any arguments I set up. If this needs to be a default, please take note of this.

    Conclusions

    Currently, all modes in automate_tests.sh run for me, so it seems that these changes are compatible. It would be great to have unit and integration tests with pytest to confirm.

    Please check that it works on MacOS still, as I have only tested on WSL.

    No additional dependencies have been added, only core libraries were used.

    Please feel free to make any changes you see fit or discuss!

    enhancement 
    opened by schackartk 7
  • Question: GPL-3.0 license required for this repo because of lightdock?

    Question: GPL-3.0 license required for this repo because of lightdock?

    Hello @brianjimenez - Hope this message finds you well.

    I am trying to figure out what license is the best choice for our E2EDNA 2.0 software and am aware that LightDock is licensed under GNU GPLv3. According to the license guide website (link) provided by GitHub, the GNU GPLv3 seems to require "larger works using a licensed work" to be under the same license. Currently our E2EDNA 2.0 is under Apache-2 license which does not include the condition of "same license". In my opinion, Apache-2 license could give some flexibility because a future version of the E2EDNA software may provide multiple options of different auto-docker package.

    A little summary of how LightDock is used in E2EDNA 2.0 now: lightdock-0.9.2 is installed by pip and the python scripts such as lightdock3.py are directly called without modification. Does our way of using LightDock fall into the category where we can only choose GNU GPLv3 for our E2EDNA 2.0? I am not sure of this question therefore would like to hear the LightDock developer's opinions.

    Thank you very much!

    question 
    opened by taoliu032 4
  • Lightdock Rust nucleic support

    Lightdock Rust nucleic support

    Dear E2DNA2 developers,

    Since you are using LightDock in some parts of your pipeline, it could be of your interest the 0.2.0 release of the Rust implementation of the framework. This new release adds support for protein-nucleic complex prediction and typically runs 5x-6x faster compared to the Python+C implementations of the Python LightDock flavor, and two orders of magnitude less amount of memory. There is more information on how to compile and use the Rust version here.

    Hope it helps!

    enhancement 
    opened by brianjimenez 3
  • Enhancement: Avoid runtime exception when

    Enhancement: Avoid runtime exception when "run" folder exists

    If the output directory for the current run already exists, right now an exception is produced:

    Start automating tests one by one...
    ====================================
    TESTING MODE #1: '2d structure'
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
        os.mkdir(self.workDir)
    FileExistsError: [Errno 17] File exists: '/home/ken/personal/E2EDNA2/localruns/run1'
    
    END OF TEST #1. Results are saved to folder "run1", where:
            2d structure: in record.txt
    

    An exception could be avoided by validating that the output directory does not exist, and providing a useful message such as "The output directory for this run already exists at './localrun/run1'", and an optional -f/--force flag could be provided to overwrite the output directory.

    opened by schackartk 2
  • Bug: Runtime exception when params['workdir'] does not exist

    Bug: Runtime exception when params['workdir'] does not exist

    When the directory in the variable params['workdir'] does not exist, the program fails at runtime:

    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
        os.mkdir(self.workDir)
    FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1'
    

    This could be fixed by checking for the directory, and creating it if it does not exist:

    if not os.path.isdir(params['workdir'])
        os.mkdir(params['workdir'])
    
    opened by schackartk 2
  • Error: 'str' object is not callable; in opendna.py, line 535

    Error: 'str' object is not callable; in opendna.py, line 535

    Hello,

    I am excited to try out this tool!

    I have installed all dependencies successfully (I believe), and I am running the script automate_tests.sh. Most tests are passing, but tests 4, 5, 7, and 8 are failing during the docking step with the same exception.

    TESTING MODE #4: 'coarse dock'
    Starting Fresh Run 4
    Simulation mode: coarse dock
    Simulating TAATGTTAATTG with YQTQ.pdb
    Getting Secondary Structure(s)
    Running over 1 possible 2D structures.
    2D structure #0 is                              : .(((....))).
    
    Folding Aptamer from Sequence. Fold speed = quick.
    Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
    2D structure after MMB folding (from MDAnalysis): .(((....))).
    Initial fold fidelity = 1.000
    Initial fold fidelity = 1.000 (from MDAnalysis)
    Folded the aptamer and generated the folded structure: foldedAptamer_0.pdb
    
    No relaxation (smoothing) of the folded aptamer.
    
    Docking
    Traceback (most recent call last):
      File "main.py", line 230, in <module>
        opendnaOutput = opendna.run()  # retrieve binding information (eventually this should become a normalized c-number)    
      File "/home/ken/personal/E2EDNA2/opendna.py", line 297, in run
        outputDict['dock scores {}'.format(self.i)] = self.dock(self.pdbDict['representative aptamer {}'.format(self.i)], self.targetPDB)  # eg, "peptide.pdb" which can be created given peptide sequence by buildPeptide in function dock
      File "/home/ken/personal/E2EDNA2/opendna.py", line 535, in dock
        ld.run()
    TypeError: 'str' object is not callable
    

    I am unsure what the underlying problem is, but maybe it has to do with a mistake between:

    • The instance variable run on line 487 of instances.py: self.run = params['ld run']
    • The method run() on line 504 of instances.py: def run(self):

    Because the instance variable from line 487 is the string value set on line 220 in main.py: params['ld run'] = 'lightdock3.py'. Maybe this variable is somehow shadowing the method run(), and so it is failing to "call" str() (i.e. 'lightdock3.py'())?

    I would appreciate any help with resolving this.

    Thank you!

    bug 
    opened by schackartk 2
  • Bug: Mysterious error when using invalid mode

    Bug: Mysterious error when using invalid mode

    If the mode is misspelled or an invalid choice, an excpetion occurs:

    $ python main.py --run_num=1 --mode='fulldock' --aptamerSeq='GCGCGCGCGATATATAT' --ligand='my_ligand.pdb' --ligandType='other' --ligandSeq=''
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 52, in __init__
        if self.actionDict['make workdir']:
    KeyError: 'make workdir'
    

    The exception doesn't seem to mention the invalid --mode, so the user may be confused as to what happened.

    I have confirmed that this runs fine once the mode name is corrected.

    This issue is resolved in #8 by using argparse and specifying the valid choices. Here is what is displayed from the code in that pull request:

    $ ./main.py -r 1 -m 'fulldock' -a aptamers/my_aptamer.txt -l my_ligand.pdb -t other -f
    usage: main.py [-h] [-f] -r INT -m MODE -a SEQ -l PDB [-t TYPE] [-s SEQ]
                   [-d RUN] [-p DEV] [-w DIR] [-md DIR] [-mb MMB]
    main.py: error: argument -m/--mode: invalid choice: 'fulldock' (choose from '2d structure', '3d coarse', '3d smooth', 'coarse dock', 'smooth dock', 'free aptamer', 'full dock', 'full binding')
    
    opened by schackartk 1
  • Enhancement: More control over output location

    Enhancement: More control over output location

    It seems a bit restrictive to enforce that the output directory be structured as {workdir}/run{runnum}/. Most tools allow you to specify the output directory yourself.

    This could be useful to the user (myself included) for organizing runs, and automating using a workflow manager. For instance, if I am running several combinations of aptamer, ligands, and modes, I may want my output directories to be {aptamer}/{ligand}/{mode}/. This structure is meaningful to me unlike the folder name "run1".

    While this is not resolved in #8 , it would reduce the number of arguments. Instead of having both --workdir and --run_num, you could just have a single --outdir argument.

    enhancement 
    opened by schackartk 1
  • Bug: Ligand file in a folder causes exception

    Bug: Ligand file in a folder causes exception

    If the ligand pdb file is in a folder instead of the root of the repo, an exception occurs:

    $ ls ligands/
    my_ligand.pdb
    
    $ python main.py --run_num=1 --mode='full dock' --aptamerSeq='GCGCGCGCGATATATAT' --ligand='ligands/my_ligand.pdb' --ligandType='other' --ligandSeq=''
    Starting Fresh Run 1
    Traceback (most recent call last):
      File "main.py", line 229, in <module>
        opendna = opendna(params)  # instantiate the class
      File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
        self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
      File "/home/ken/personal/E2EDNA2/opendna.py", line 179, in setup
        copyfile(self.targetPDB, self.workDir + '/' + self.targetPDB)
      File "/home/ken/personal/E2EDNA2/env/lib/python3.7/shutil.py", line 121, in copyfile
        with open(dst, 'wb') as fdst:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1/ligands/my_ligand.pdb'
    

    I don't see any reason that the ligand file should not be in a folder, so this should not fail.

    bug 
    opened by schackartk 2
Releases(v2.0.0)
  • v2.0.0(May 16, 2022)

    This release is associated with the JOSS publication: https://doi.org/10.21105/joss.04182 The release has also been archived on Zenodo: https://doi.org/10.5281/zenodo.6546661

    Clarification: the archive folder will have a name of "E2EDNA2-2.0.0", once downloaded from below. It refers to the version v2.0.0 of E2EDNA. The name "E2EDNA2" is inherited from the repository name.

    To view the repository: https://github.com/siminegroup/E2EDNA2/tree/v2.0.0 Full Changelog: https://github.com/siminegroup/E2EDNA2/commits/v2.0.0

    Source code(tar.gz)
    Source code(zip)
Owner
computational chemistry group at McGill University
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

8 Feb 10, 2022
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

75 Dec 16, 2022
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022
Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT CheXbert is an accurate, automated dee

Stanford Machine Learning Group 51 Dec 08, 2022
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

Clova AI Research 34 Apr 13, 2022
Project of 'TBEFN: A Two-branch Exposure-fusion Network for Low-light Image Enhancement '

TBEFN: A Two-branch Exposure-fusion Network for Low-light Image Enhancement Codes for TMM20 paper "TBEFN: A Two-branch Exposure-fusion Network for Low

KUN LU 31 Nov 06, 2022
The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning Preprocess file of the dataset used in implicit sub-populations: (Demographic groups

<a href=[email protected]"> 4 Oct 14, 2022
Python-based Informatics Kit for Analysing Chemical Units

INSTALLATION Python-based Informatics Kit for the Analysis of Chemical Units Step 1: Make a conda environment: conda create -n pikachu python=3.9 cond

47 Dec 23, 2022
DP-CL(Continual Learning with Differential Privacy)

DP-CL(Continual Learning with Differential Privacy) This is the official implementation of the Continual Learning with Differential Privacy. If you us

Phung Lai 3 Nov 04, 2022
Python calculations for the position of the sun and moon.

Astral This is 'astral' a Python module which calculates Times for various positions of the sun: dawn, sunrise, solar noon, sunset, dusk, solar elevat

Simon Kennedy 169 Dec 20, 2022
[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

TransMaS This repository is the official pytorch implementation of the following paper: NIPS2021 Mixed Supervised Object Detection by TransferringMask

BCMI 49 Jul 27, 2022
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

111 Dec 29, 2022
Node for thenewboston digital currency network.

Project setup For project setup see INSTALL.rst Community Join the community to stay updated on the most recent developments, project roadmaps, and ra

thenewboston 27 Jul 08, 2022
PyTorch implementation of the Flow Gaussian Mixture Model (FlowGMM) model from our paper

Flow Gaussian Mixture Model (FlowGMM) This repository contains a PyTorch implementation of the Flow Gaussian Mixture Model (FlowGMM) model from our pa

Pavel Izmailov 124 Nov 06, 2022
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

RAVE: Realtime Audio Variational autoEncoder Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthes

ACIDS 587 Jan 01, 2023
NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering Paper: https://arxiv.org/abs/2103.00762 Running Run on the provided DTU scene cd run ba

Fanbo Xiang 67 Dec 28, 2022
Code for "NeRS: Neural Reflectance Surfaces for Sparse-View 3D Reconstruction in the Wild," in NeurIPS 2021

Code for Neural Reflectance Surfaces (NeRS) [arXiv] [Project Page] [Colab Demo] [Bibtex] This repo contains the code for NeRS: Neural Reflectance Surf

Jason Y. Zhang 234 Dec 30, 2022
Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation of the NeurIPS 2021 paper Alias-Free Generative Adversarial Net

Diego Porres 185 Dec 24, 2022
Neural network for digit classification powered by cuda

cuda_nn_mnist Neural network library for digit classification powered by cuda Resources The library was built to work with MNIST dataset. python-mnist

Nikita Ardashev 1 Dec 20, 2021
Official PyTorch implementation of the paper "Graph-based Generative Face Anonymisation with Pose Preservation" in ICIAP 2021

Contents AnonyGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evaluation Acknowledgments Citat

Nicola Dall'Asen 10 May 24, 2022