A DSL for data-driven computational pipelines

Overview

Nextflow logo

"Dataflow variables are spectacularly expressive in concurrent programming"
Henri E. Bal , Jennifer G. Steiner , Andrew S. Tanenbaum

Nextflow CI Nextflow version Chat on Gitter Nextflow Twitter Nextflow Publication install with bioconda Nextflow license

Quick overview

Nextflow is a bioinformatics workflow manager that enables the development of portable and reproducible workflows. It supports deploying workflows on a variety of execution platforms including local, HPC schedulers, AWS Batch, Google Cloud Life Sciences, and Kubernetes. Additionally, it provides support for manage your workflow dependencies through built-in support for Conda, Docker, Singularity, and Modules.

Contents

Rationale

With the rise of big data, techniques to analyse and run experiments on large datasets are increasingly necessary.

Parallelization and distributed computing are the best ways to tackle this problem, but the tools commonly available to the bioinformatics community often lack good support for these techniques, or provide a model that fits badly with the specific requirements in the bioinformatics domain and, most of the time, require the knowledge of complex tools or low-level APIs.

Nextflow framework is based on the dataflow programming model, which greatly simplifies writing parallel and distributed pipelines without adding unnecessary complexity and letting you concentrate on the flow of data, i.e. the functional logic of the application/algorithm.

It doesn't aim to be another pipeline scripting language yet, but it is built around the idea that the Linux platform is the lingua franca of data science, since it provides many simple command line and scripting tools, which by themselves are powerful, but when chained together facilitate complex data manipulations.

In practice, this means that a Nextflow script is defined by composing many different processes. Each process can execute a given bioinformatics tool or scripting language, to which is added the ability to coordinate and synchronize the processes execution by simply specifying their inputs and outputs.

Quick start

Download the package

Nextflow does not require any installation procedure, just download the distribution package by copying and pasting this command in your terminal:

curl -fsSL https://get.nextflow.io | bash

It creates the nextflow executable file in the current directory. You may want to move it to a folder accessible from your $PATH.

Download from Conda

Nextflow can also be installed from Bioconda

conda install -c bioconda nextflow 

Documentation

Nextflow documentation is available at this link http://docs.nextflow.io

HPC Schedulers

Nextflow supports common HPC schedulers, abstracting the submission of jobs from the user.

Currently the following clusters are supported:

For example to submit the execution to a SGE cluster create a file named nextflow.config, in the directory where the pipeline is going to be launched, with the following content:

process {
  executor='sge'
  queue='
   
   
    
    '
   
   
}

In doing that, processes will be executed by Nextflow as SGE jobs using the qsub command. Your pipeline will behave like any other SGE job script, with the benefit that Nextflow will automatically and transparently manage the processes synchronisation, file(s) staging/un-staging, etc.

Cloud support

Nextflow also supports running workflows across various clouds and cloud technologies. Managed solutions from major cloud providers are also supported through AWS Batch, Azure Batch and Google Cloud compute services. Additionally, Nextflow can run workflows on either on-prem or managed cloud Kubernetes clusters.

Currently supported cloud platforms:

Tool management

Containers

Nextflow has first class support for containerization. It supports both Docker and Singularity container engines. Additionally, Nextflow can easily switch between container engines enabling workflow portability.

process samtools {
  container 'biocontainers/samtools:1.3.1'

  """
  samtools --version 
  """

}

Conda environments

Conda environments provide another option for managing software packages in your workflow.

Environment Modules

Environment modules commonly found in HPC environments can also be used to manage the tools used in a Nextflow workflow.

Community

You can post questions, or report problems by using the Nextflow discussion forum or the Nextflow channel on Gitter.

Nextflow also hosts a yearly workshop showcasing researcher's workflows and advancements in the langauge. Talks from the past workshops are available on the Nextflow YouTube Channel

The nf-core project is a community effort aggregating high quality Nextflow workflows which can be used by the community.

Build from source

Required dependencies

  • Compiler Java 8 or later
  • Runtime Java 8 or later

Build from source

Nextflow is written in Groovy (a scripting language for the JVM). A pre-compiled, ready-to-run, package is available at the Github releases page, thus it is not necessary to compile it in order to use it.

If you are interested in modifying the source code, or contributing to the project, it worth knowing that the build process is based on the Gradle build automation system.

You can compile Nextflow by typing the following command in the project home directory on your computer:

make compile

The very first time you run it, it will automatically download all the libraries required by the build process. It may take some minutes to complete.

When complete, execute the program by using the launch.sh script in the project directory.

The self-contained runnable Nextflow packages can be created by using the following command:

make pack

Once compiled use the script ./launch.sh as a replacement for the usual nextflow command.

The compiled packages can be locally installed using the following command:

make install

A self-contained distribution can be created with the command: make pack. To include support of GA4GH and its dependencies in the binary, use make packGA4GH instead.

IntelliJ IDEA

Nextflow development with IntelliJ IDEA requires the latest version of the IDE (2019.1.2 or later).

If you have it installed in your computer, follow the steps below in order to use it with Nextflow:

  1. Clone the Nextflow repository to a directory in your computer.
  2. Open IntelliJ IDEA and choose "Import project" in the "File" menu bar.
  3. Select the Nextflow project root directory in your computer and click "OK".
  4. Then, choose the "Gradle" item in the "external module" list and click on "Next" button.
  5. Confirm the default import options and click on "Finish" to finalize the project configuration.
  6. When the import process complete, select the "Project structure" command in the "File" menu bar.
  7. In the showed dialog click on the "Project" item in the list of the left, and make sure that the "Project SDK" choice on the right contains Java 8.
  8. Set the code formatting options with setting provided here.

Contributing

Project contribution are more than welcome. See the CONTRIBUTING file for details.

Build servers

License

The Nextflow framework is released under the Apache 2.0 license.

Citations

If you use Nextflow in your research, please cite:

P. Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017) doi:10.1038/nbt.3820

Credits

Nextflow is built on two great pieces of open source software, namely Groovy and Gpars.

YourKit is kindly supporting this open source project with its full-featured Java Profiler. Read more http://www.yourkit.com

Comments
  • Syntax enhancement aka DLS-2

    Syntax enhancement aka DLS-2

    This is a request for comments for the implementation of modules feature for Nextflow.

    This feature allows the definition of NF processes in the main script or a separate library file, that can be invoked, one or multiple times, as any other routine passing the requested input channels as arguments.

    Process definition

    The syntax for the definition of a process is nearly identical to the usual one, it only requires the use of processDef instead of process and the omission of the from/into declarations. For example:

    processDef index {
        tag "$transcriptome_file.simpleName"
    
        input:
        file transcriptome 
    
        output:
        file 'index' 
    
        script:
        """
        salmon index --threads $task.cpus -t $transcriptome -i index
        """
    }
    

    The semantic and supported features remain identical to current process. See a complete example here.

    Process invocation

    Once a process is defined it can be invoked like any other function in the pipeline script. For example:

    transcriptome = file(params.transcriptome)
    index(transcriptome)
    

    Since the index defines an output channel its return value can be assigned to a channel variable that can be used as usual eg:

    transcriptome = file(params.transcriptome)
    index_ch = index(transcriptome)
    index_ch.println()
    

    If the process were producing two (or more) output channels the multiple assignment syntax can be used to get a reference to the output channels.

    Process composition

    The result of a process invocation can be passed to another process like any other function, eg:

    processDef foo {
      input: 
        val alpha
      output: 
        val delta
        val gamma
      script:
        delta = alpha
        gamma = 'world'
        "some_command_here"
    }
    
    processDef bar {
      input:
        val xx
        val yy 
      output:
        stdout()
      script:
        "another_command_here"        
    }
    
    bar(foo('Hello'))
    

    Process chaining

    Processes can also be invoked as custom operators. For example a process foo taking one input channel can be invoked as:

    ch_input1.foo()
    

    when taking two channels as:

    ch_input1.foo(ch_input2)
    

    This allows the chaining of built-in operators and processes together eg:

    Channel
        .fromFilePairs( params.reads, checkIfExists: true )
        .into { read_pairs_ch; read_pairs2_ch }
    
    index(transcriptome_file)
        .quant(read_pairs_ch)
        .mix(fastqc(read_pairs2_ch))
        .collect()
        .multiqc(multiqc_file)
    

    See the complete script here.

    Library file

    A library is just a NF script containing one or more processDef declarations. Then the library can be imported using the importLibrary statement, eg:

    importLibrary 'path/to/script.nf'
    

    Relative paths are resolved against the project baseDir variable.

    Test it

    You can try to the current implementation using the version 19.0.0.modules-draft2-SNAPSHOT eg.

    NXF_VER=19.0.0.modules-draft2-SNAPSHOT nextflow run rnaseq-nf -r modules
    

    Open points

    1. When a process is defined in a library file, should it be possible to access to the params values? Currently it's possible, but I think this is not a good idea because makes the library depending on the script params making it very fragile.

    2. How to pass parameters to a process defined in library files eg. For example memory and cpus settings? It could be done using config file as usual, still I expect there could be the need to parametrise the process definition and specify the parameters at invocation time.

    3. Should a namespace be used when defining the processes in library? What if two or more processes have the same name in different library files?

    4. One or many processes per library file? Currently it can be defined any number of processes, I'm starting to think that it would be better to allow the definition only of one process per file. This would simplify the reuse across different pipelines, the import in tools such as dockstore and it would make the dependencies of the pipeline more intelligible.

    5. Remote library file? Not sure it's a good idea to being able to import remote hosted files e.g. http://somewhere/script.nf. Remote paths tend to change over time.

    6. Should a versioning number be associated with the process definition? how to use or enforce it?

    7. How test process components? ideally it should be possible to include the required contained in the process definition and unit test each process independently.

    8. How chain a process retuning multiple channels?

    kind/feature lang/dsl2 
    opened by pditommaso 114
  • Nextflow parameter description scheme

    Nextflow parameter description scheme

    TL;DR

    A naming scheme to enable meta-data annotation for workflow parameters.

    Details

    Usually, workflow-specific execution parameters for the single processes are defined in the params scope, a DSL-feature Nextflow provides to access parameter variables from the workflow script during wf execution.

    Currently, there is no naming scheme / convention / language feature for annotating parameters with description text, mandatory/optional flags or similar.

    This could be useful though for upstream applications in order to build graphical user interfaces and configure a workflow correctly before execution in a dynamic, user-friendly way.

    I am very happy for any input here and design suggestions :)

    Best, Sven

    kind/feature pri/moderate 
    opened by sven1103 80
  • wr as new Nextflow backend

    wr as new Nextflow backend

    New feature

    I develop wr which is a workflow runner like Nextflow, but can also just be used as a backend scheduler. It can schedule to LSF and OpenStack right now.

    The benefit to Nextflow users of going via wr instead of using Nextflow’s existing LSF or Kubernetes support is:

    1. wr makes more efficient use of LSF: it can pick an appropriate queue, use job arrays, and “reuse” job slots. In a simple test I did, Nextflow using wr in LSF mode was 2 times faster than Nextflow using its own LSF scheduler.
    2. wr’s OpenStack support is incredibly easy to use and set up (basically a single command to run), and provides auto scaling up and down. Kubernetes, by comparison, is really quite complex to get working on OpenStack, doesn’t auto scale, and wastes resources with multiple nodes needed even while no workflows are being operated on. I was able to get Nextflow to work with wr in OpenStack mode (but the shared disk requirement for Nextflow’s state remains a concern).

    Usage scenario

    Users with access to LSF or OpenStack clusters who want to run their Nextflow workflows efficiently and easily.

    Suggest implementation

    Since I don’t know Java well enough to understand how to implement this “correctly”, I wrote a simple bsub emulator in wr, which is what my tests so far have been based on. I submit the Nextflow command as a job to wr, turning on the bsub emulation, and configure Nextflow to use its existing LSF scheduler. While running under the emulation, Nextflow’s bsub calls actually call wr.

    Of course the proper way to do this would be have Nextflow call wr directly (either the wr command line, or it’s REST API). The possibly tricky thing with regard to having it work in OpenStack mode is having it tell wr about OpenStack-specific things like what image to use, what hardware flavour to use, pass details on how to mount S3 etc. (the bsub emulation handles all of this).

    Here's what I did for my LSF test...

    echo_1000_sleep.nf:

    #!/usr/bin/env nextflow
    
    num = Channel.from(1..1000)
    
    process echo_sleep {
      input:
      val x from num
    
    	output:
    	stdout result
    
      "echo $x && sleep 1"
    }
    
    result.subscribe { println it }
    
    workflow.onComplete {
        println "Pipeline completed at: $workflow.complete"
        println "Execution status: ${ workflow.success ? 'OK' : 'failed' }"
    }
    

    nextflow.config:

    process {
      executor='lsf'
      queue='normal'
    	memory='100MB'
    }
    

    install wr:

    wget https://github.com/VertebrateResequencing/wr/releases/download/v0.17.0/wr-linux-x86-64.zip
    unzip wr-linux-x86-64.zip
    mv wr /to/somewhere/in/my/PATH/wr
    

    run:

    wr manager start -s lsf
    echo "nextflow run ./echo_1000_sleep.nf" | wr add --bsub -r 0 -i nextflow --cwd_matters --memory 1GB
    

    Here's what I did to get it to work in OpenStack...

    nextflow_install.sh:

    sudo apt-get update
    sudo apt-get install openjdk-8-jre-headless -y
    wget -qO- https://get.nextflow.io | bash
    sudo mv nextflow /usr/bin/nextflow
    

    put input files in S3:

    s3cmd put nextflow.config s3://sb10/nextflow/nextflow.config
    s3cmd put echo_1000_sleep.nf s3://sb10/nextflow/echo_1000_sleep.nf
    

    ~/.openstack_rc:

    [your rc file containing OpenStack environment variables downloaded from Horizon]
    

    run:

    source ~/.openstack_rc
    wr cloud deploy --os 'Ubuntu Xenial' --username ubuntu
    echo "cp echo_1000_sleep.nf /shared/echo_1000_sleep.nf && cp nextflow.config /shared/nextflow.config && cd /shared && nextflow run echo_1000_sleep.nf" | wr add --bsub -r 0 -o 2 -i nextflow --memory 1GB --mounts 'ur:sb10/nextflow' --cloud_script nextflow_install.sh --cloud_shared
    

    The NFS share at /shared created by the --cloud_shared option is slow and limited in size; a better solution would be to set up your own high performance shared filesystem in OpenStack (eg. GlusterFS), then add to nextflow_install.sh to mount this share. Or even better, is there a way to have Nextflow not store state on disk? If it could just query wr for job completion status, that would be better.

    kind/feature pri/low 
    opened by sb10 79
  • Introduce HTTP POST feature and broadcast workflow process runtime information

    Introduce HTTP POST feature and broadcast workflow process runtime information

    Motivation I thought it would be super cool to have a mechanism to let Nextflow send trace reports from the workflow processes during workflow execution, so one can monitor the workflow status and process on remote target sites (Webportals, API webservice with database logging, etc.).

    This idea was already mentioned here https://github.com/nextflow-io/nextflow/pull/454 by @mes5k, but using websockets instead. Websockets are a lot more complex as they are stateful and one would need to be very careful with the implementation not to brake workflow execution. So I desided to go for simple HTTP POST requests, and it is the task of the user to provide a webserver, that consumes the information (JSON).

    Mechanism I introduced a -with-messages option, that will trigger this functionality. You can specify the url in a messages-scope:

    messages {
       // example URL
       url = "http://api.myserver.com/workflow/monitor"
    }
    

    The logic is contained in the MessageObserver class analogous to the other observers, implementing the interface TraceObserver. HTTP POST requests will send a JSON object with information during the following Nextflow execution steps:

    • onFlowStart() - when the workflow starts
    • onFlowComplete() - when the workflow is completed
    • onProcessSubmit() - when a process is submitted
    • onProcessStart() - when a process starts
    • onProcessComplete() - when a process is completed

    and new:

    • onFlowError() - which is invoked now in all observers when the Nextflow session catches an error.

    For the latter, I observed, that the TaskHandler object was always null, even when I changed the error strategy to 'finish', which should call Session.cancel(handler) and not Session.abort(task.error). For that I had to change two lines of code https://github.com/nextflow-io/nextflow/commit/8104f6d5ccde600396dfa260eeda0e73a9e69b87, hope that is OK.

    Information send via HTTP A JSON object with the following structure:

    {
       "runName": "<Nextflow run name>",
       "runID": "<Nextflow run ID>",
       "runStatus": "<started|running|error|completed>",
       "trace": "<Nextflows trace record>"
    }
    
    

    Note: The "trace": "<Nextflows trace record>" entry is NOT present when onFlowStart() and onFlowComplete() are invoked. When present, "trace" contains all information showed in the Nextflow documentation.

    As this is my first PR here, critically remark my code and I am happy for feedback!

    Sven

    opened by sven1103 66
  • Workflow report should warn if some task executions were ignored

    Workflow report should warn if some task executions were ignored

    It would be great if the Workflow Report (and other things) could warn if there were tasks that failed but were ignored. To do this, it would great to have variables with the counts of different task outputs. For example:

    workflow.task_counts.success
    workflow.task_counts.cached
    workflow.task_counts.failed
    workflow.task_counts.ignored
    

    ..or whatever makes sense.

    There can be some complication with tasks that fail but are resubmitted and succeed, but I guess if we can just count the ignored ones then we should be fine (we will get a pipeline error if something fails properly).

    Thanks!

    Phil

    opened by ewels 53
  • Initial version of K8s Jobs

    Initial version of K8s Jobs

    Preview of initial version of K8s Jobs. Feel free to comment what to change and rework. Tests were not updated yet and not all error cases are managed right now.

    platform/k8s 
    opened by xhejtman 52
  • Queue status command fail on LSF version 8

    Queue status command fail on LSF version 8

    Bug report

    Hi! After upgrading to nextflow 18.10.1 from 0.32.0, I started seeing this message repeatedly in nextflow output:

    WARN: [LSF] queue status cannot be fetched > exit status: 255
    
    WARN: [LSF] queue status cannot be fetched > exit status: 255
    
    WARN: [LSF] queue status cannot be fetched > exit status: 255
    
    WARN: [LSF] queue status cannot be fetched > exit status: 255
    
    WARN: [LSF] queue status cannot be fetched > exit status: 255
    
    WARN: [LSF] queue status cannot be fetched > exit status: 255
    
    WARN: [LSF] queue status cannot be fetched > exit status: 255
    

    All cluster jobs, however, seem to be working fine, and the nextflow pipeline is producing all files normally.

    This is my nextflow.config file:

    process.executor = "lsf"
    executor.queueSize = 1000
    
    env.PATH = "/lab/solexa_weng/testtube/trinityrnaseq-Trinity-v2.8.4:/lab/solexa_weng/testtube/TransDecoder-TransDecoder-v5.0.2:/lab/solexa_weng/testtube/transrate-1.0.3-linux-x86_64:/lab/solexa_weng/testtube/signalp-4.1:/lab/solexa_weng/testtube/tmhmm-2.0c/bin:/lab/solexa_weng/testtube/ncbi-blast-2.7.1+/bin:/lab/solexa_weng/testtube/bowtie2-2.3.4.3-linux-x86_64:$PATH"
    
    report.enabled = true
    
    • Nextflow version: 18.10.1 build 5003
    • Java version: 1.8.0_161
    • Operating system: Linux
    kind/enhancement platform/lsf 
    opened by tomas-pluskal 50
  • Completed jobs are detected with a big delay

    Completed jobs are detected with a big delay

    Bug report

    After job status transitions to SUCCEEDED (maybe also FAILED) within AWS batch, this status update is delayed available (hours) to nextflow as COMPLETED.

    Expected behavior and actual behavior

    nextflow job status update should be very close to the AWS Batch job transition.

    Program output

    See attached screenshot of the S3 workdir bucket contents. image

    This is the matching nextflow log output.

    $ time ~/nextflow/nextflow log loving_avogadro -f name,submit,start,complete,duration,realtime,task_id,workdir -F "name =~ '/.*pattern.*/ '"
    (pattern)        2018-07-10 01:05:19.143 -       2018-07-10 06:26:22.165 5h 21m 3s       31m 24s 121090  s3://test/work/work/e4/cedf1aa5e367f871b12572b0f4be4e
    
    real    0m38.435s
    user    0m38.599s
    sys     0m1.378s
    
    

    Comparing the S3 screenshot and nextflow output there is a major delay between job completed and nextflow updating that status.

    Steps to reproduce the problem

    Submit (maybe hundreds) of jobs to AWS Batch.

    Environment

    • Nextflow version: 0.31.0-SNAPSHOT build 4911
    • Java version: Java HotSpot(TM) 64-Bit Server VM 1.8.0_171-b11
    • Operating system: Linux
    platform/aws-batch 
    opened by tbugfinder 47
  • slow job submission with awsbatch executor

    slow job submission with awsbatch executor

    I would like to submit >20000 jobs (one per each file in a S3 bucket) for parallel processing.

    In average nextflow submission to awsbatch only creates ~1 job per second to awsbatch which is too slow for this kind of large scale szenario. Each files is processed within 5-30min. This means in a horizontal scaling that the overall processing time could be executed in ~30min (plus overhead for instance creation and reporting).

    nextflow trace:

    May-18 14:43:52.559 [Task submitter] TRACE n.executor.AwsBatchFileCopyStrategy - [AWS BATCH] Unstaging file path: [RESULT.gz]
    
    May-18 14:43:52.718 [Task submitter] TRACE n.executor.AwsBatchTaskHandler - [AWS BATCH] new job request > {JobName: myjobname,JobQueue: myjobqueue,JobDefinition: myjobdefinition,ContainerOverrides: {Command: [bash, -o, pipefail, -c, trap "{ ret=$?; aws s3 cp --only-show-errors .command.log s3://testbucket123/work/sum-97eccecc-d7e9-431a-a3bc-20440c5c77e7/b0/f3781d20e330ae13258b7c15157b64/.command.log||true; exit $ret; }" EXIT; aws s3 cp --only-show-errors s3://testbucket123/work/sum-97eccecc-d7e9-431a-a3bc-20440c5c77e7/b0/f3781d20e330ae13258b7c15157b64/.command.run - | bash 2>&1 | tee .command.log],},RetryStrategy: {Attempts: 4},Timeout: {AttemptDurationSeconds: 108000}}
    
    May-18 14:43:52.748 [Task submitter] DEBUG n.executor.AwsBatchTaskHandler - [AWS BATCH] submitted > job=78fd5324-1fee-4503-be6f-676d6606f631; work-dir=s3://testbucket123/work/sum-97eccecc-d7e9-431a-a3bc-20440c5c77e7/b0/f3781d20e330ae13258b7c15157b64
    
    May-18 14:43:52.748 [Task submitter] INFO  nextflow.Session - [b0/f3781d] Submitted process > myjobname
    

    using later version:

      Version: 0.30.0-SNAPSHOT build 4813
      Modified: 18-05-2018 15:51 UTC
      System: Linux 4.9.85-38.58.amzn1.x86_64
      Runtime: Groovy 2.4.15 on OpenJDK 64-Bit Server VM 1.8.0_171-b10
      Encoding: UTF-8 (UTF-8)
      Process: [email protected] [10.4.13.234]
      CPUs: 8 - Mem: 14.7 GB (8.7 GB) - Swap: 0 (0)
    

    .........

    #to be added#

    AWS Batch itself doesn't throttle the submission. Test:

    i="0"
    time (while [ $i -lt 30000 ]
    do
    date
    time  (aws batch submit-job \
                    --region eu-west-1 \
                    --job-name test_test_test \
                    --job-queue spot40-queue \
                    --job-definition myjobdefinition:1 \
                    --retry-strategy attempts=3 \
                    --parameters cmd=/bin/ls,batch_run_id=test > /dev/null )
    let i=i+1
    done
    )
    

    ==> finished within 20 minutes.

    platform/aws-batch 
    opened by tbugfinder 46
  • Inconsistency of memory usage values within the report and timeline

    Inconsistency of memory usage values within the report and timeline

    Bug report

    Expected behavior and actual behavior

    expected behavior

    Whatever we look at the Tasks table in Raw values mode or Human readable mode we expect to read the exact same values for the memory usage if the decimal system is used. We also expect these values to be the same in the Timeline.

    actual behavior

    • in the Memory usage plot from the report, the memory value reported is 1051574272 (raw value extracted from Plotly) and displayed as 1.052G in the boxplot

    • in the Tasks table from the report in the Raw values mode, the memory value reported in the field vmem is 1051574272 which is consistent the previous value in the plot

    • in the Tasks table from the report in the Human readable mode, the memory value reported in the field vmem is 1.1GB seems to be ~ 1051574272 x (1.024 x 1.024). This value should be 0.97935GB = 1051574272/(1024 x 1024 x 1024) if the binary unit is used. Same behavior holds for peak_vmem and allocated memory fields but not for rss and peak_rss.

    • in the Timeline, the memory displayed is 1002.9 MB which is ~ 1051574272/(1024 x 1024) = 1002.859375. This is correct if the binary unit is used.

    It should be mentioned whether you use decimal or binary unit to display the memory usage.

    Steps to reproduce the problem

    Install the program stress: sudo apt-get install stress or sudo yum install stress

    Create the following nextflow.nf script:

    #!/usr/bin/env nextflow
    
    process TwoCpus4mn {
    
        cpus 1
        memory '1 GB'
    
        """
        /usr/bin/time -v stress -c 2 -t 15 -m 1 --vm-bytes 1000000000
        """
    }
    

    Launch:

    nextflow nextflow.nf -with-report report.html -with-timeline timeline.html

    Environment

    • Nextflow version: [18.10.1]
    • Java version: [OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode)]
    • Operating system: [Linux 4.15.0-43-generic x86_64]
    kind/bug 
    opened by phupe 44
  • Enhance process metrics to avoid usage of ps tool

    Enhance process metrics to avoid usage of ps tool

    Hi,

    I am using Nextflow with Biocontainers, which are gaining increasing traction (link1, link2). Biocontainers use a minimal BusyBox. That caused issues with coreutils that were previously fixed in Issue #321. I am now noticing another error in the .command.err log for any process started by Nextflow that leverages a Biocontainer:

    ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss
    

    It doesn't seem to be critical as processes still complete successfully, but I wonder whether this could still be fixed to get full BusyBox/Biocontainers support?

    Thanks in advance.

    known issue pri/moderate 
    opened by rspreafico 43
  • add support for AWS_SESSION_TOKEN

    add support for AWS_SESSION_TOKEN

    There's a few issues floating around about this:

    • https://github.com/nextflow-io/nextflow/issues/1724
    • https://github.com/nextflow-io/nextflow/issues/2839
    • https://github.com/nextflow-io/nextflow/pull/1265

    Figured I'd take a stab at implementing. Wrote a test, but also ran locally and verified that the session token actually works.

    I did combine the AWS_ACCESS_KEY/AWS_SECRET_KEY and AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY creds cases. Sure, technically, it means someone could supply an invalid config and it might work. But also, it unifies the session token logic.

    I didn't add this to the config options, because who's putting an ephemeral token in their nextflow config??

    Truthfully, I believe we're on the wrong path, and that we should just let the AWS Java SDK resolve credentials. It is opinionated in how it resolves credentials. If we delegated, we wouldn't have to account for every weird possible way of specifying aws creds. Perhaps there's a good reason not to though.

    Anyway, let me know what you think. I'd love to have this in.

    opened by jchorl 0
  • Add support for Fusion file system to Slum and LSF executors

    Add support for Fusion file system to Slum and LSF executors

    This PR adds support for Fusion file system to Slurm and LSF grid executors.

    The PR implements the following changes

    • Mark the GridTaskHandler as FusionAwareTask
    • Use the NXF_CHDIR variable as the common pattern to specify the job work directory instead of relying on grid-specific directives
    • When fusion execution is required, submit the job execution by creating an inline job wrapper piped via stdin special file, instead of creating a temporary launcher file

    This feature for production usage requires #3513 or #3514

    opened by pditommaso 0
  • Add support for Fusion file system for Sarus containerised task

    Add support for Fusion file system for Sarus containerised task

    Context

    The Fusion file system allows using S3-compatible object storage as the task work directory.

    This feature requires mounting the Fusion driver in the container process via a Fuse device.

    When using Docker or Podman this can be achieved using the following options

    --device /dev/fuse --cap-add SYS_ADMIN 
    

    See #3337 for details.

    Goal

    The goal of this feature is to allow the use of Fusion driver in containers run via the Sarus container engine.

    opened by pditommaso 1
  • Add support for Fusion file system for Singularity containerised task

    Add support for Fusion file system for Singularity containerised task

    Context

    The Fusion file system allows using S3-compatible object storage as the task work directory.

    This feature requires mounting the Fusion driver in the container process via a Fuse device.

    When using docker or podman this can be achieved using the following options

    --device /dev/fuse --cap-add SYS_ADMIN 
    

    See #3337 for details

    Goal

    The goal of this feature is to allow the use of Fusion driver in containers run via the Singularity and Apptainer engines.

    opened by pditommaso 1
  • xvfb-run waits forever when used in a nextflow process

    xvfb-run waits forever when used in a nextflow process

    Bug report

    Expected behavior and actual behavior

    We wanted to use xvfb-run igv !{batch_file} to allow igv to create a snapshot of an event on a headless AWS instance. Expected behavior is that xvfb-run will create a virtual framebuffer and igv will use it to paint the snapshot. Actual behavior is that xvfb-run waits forever after creating the Xvfb server.

    The reason is that Xvfb communicates back by sending SIGUSR1, which should kill a wait in xvfb-run. But nextflow traps (and ignores) USR1 in .command.run, which causes the wait to last forever.

    Steps to reproduce the problem

    This process file produces the problem, and it should do so with pretty much any input. (Provide a test case that reproduce the problem either with a self-contained script or GitHub repository)

    Program output

    Under normal circumstances there is NO output to stderr or stdout. If I call xvfb-run via bash -x, the last executed command is the exec command from this block

        trap : USR1
        (trap '' USR1; exec Xvfb ":$SERVERNUM" $XVFBARGS $LISTENTCP -auth $AUTHFILE >>"$ERRORFILE" 2>&1) &
        XVFBPID=$!
    
        wait || :
    

    The wait command never ends.

    Environment

    • Nextflow version: 22.04.0
    • Java version: 1.8.2
    • Operating system: ubuntu 18.04
    • Bash version: GNU bash, version 5.1.4(1)-release (x86_64-pc-linux-gnu)
    opened by TedBrookings 1
Releases(v22.12.0-edge)
  • v22.12.0-edge(Dec 13, 2022)

    • Add fair process directive [60d34cfd]
    • Add support for singularity registry setting [37c1aeb9]
    • Add AWS profile config setting [66f4669f]
    • Add support for AWS profile when resolving region [d8947707]
    • Add support for Sarus container engine (#3470) [54673f18]
    • Add support for Fusion ARM64 client [d073c538]
    • Add allowedLocations option to google batch (#3453) [c619eb81]
    • Add support for AWS config profile in NF config [37112672]
    • Add warning on Google Logs failure [bdbcdde9]
    • Add possible values of status in trace.txt to the documentation (#3451) [2425fcfb]
    • Add support for AWS Glacier restore [b6110766]
    • Add support for S3 storageClass to publishDir [066f9203]
    • Add MathHelper utility class [7eecb266]
    • Fix Wave layer invalid checksum due to not closed stream [e188bbf9]
    • Fix Fusion test [2245a1c7]
    • Fix Run fails when home is a symlink [9ff820f4]
    • Fix math overflow when copying large AWS S3 files [f32ea0ba]
    • Fix Quote the logName in the Cloud Logging filter (#3464) [b3975063]
    • Fix Google Batch cloud logging (#3443) [e2bbcf15]
    • Fix Tower plugin min nextflow requirement [1713a1cd]
    • Fix TowerArchiver resolve envar paths relative to baseDir (#3438) [46af18e5]
    • Error & info messages, code comments language fixes (#3475) [29ae36ca]
    • Replace egrep with grep -E (#3485) [ac0c3035]
    • Gradle build optimizations (#3483) [19182a57]
    • Refactor virtual FS schemes to XPath class [fd59b943]
    • Update concat operator description (#3426) [e8d8c3b5]
    • Clarify usage of additional options for path qualifier (#3405) [0b70acb1]
    • Clarify limitation of -with-docker in the docs (#3408) [79afc85d]
    • Expose process queue as K8s pod label [4df8c8d2]
    • Prefix nextflow K8s labels with nextflow.io prefix [9951fcd9]
    • Remove deprecated code [c0b164f2]
    • Rewrite fetchIamRole and fetchRegion to use AWS SDK (#3425) [e350f319]
    • Improve Wave config error reporting [ae502668]
    • Improve K8s retry on transient failures [d86ddc36]
    • Remove DSL1 output mode [fa400d5f]
    • Remove support for DSL1 multi into [f664af45]
    • Bump [email protected] [ccaab713]
    • Bump [email protected] [c07dcec2]
    • Bump [email protected]
    • Bump [email protected] [652d0880]
    • Bump fusion version URLs 0.6 [a160a8b1]
    • Bump AWS sdk version 1.12.351 [4dd82b66]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.40 KB)
    nextflow-22.12.0-edge-all(86.44 MB)
  • v22.10.4(Dec 9, 2022)

  • v22.11.1-edge(Nov 29, 2022)

    • Fix TowerArchiver resolve envar paths relative to baseDir (#3438) [53e6348c]
    • Fix tower plugin min nextflow requirement [103dbf74]
    • Fix typos in the documentation [ci skip] (#3441) [ae95d90d]
    • Add support for Java 19 [811e7ca8]
    • Add support for custom Conda channels (#3435) [ci fast] [0884e80e]
    • Add time directive to AWS Batch, clean language (#3436) [ci skip] [1ed2640a]
    • Update err message [ci fast] [ab5bd81b]
    • Fix Flux executor config (#3432) [ci fast] [68b45c92]
    • Bump [email protected] [fe669152]
    • Bump [email protected] [2dbf9906]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.40 KB)
    nextflow-22.11.1-edge-all(85.82 MB)
  • v22.11.0-edge(Nov 23, 2022)

    • Add support for Apptainer container engine (#3345) [29f98975]
    • Add Flux executor to nextflow (#3412) [cc9fc3f0] [3711cef0]
    • Add support for Wave containerPlatform [10d56ca1]
    • Add CSI ephemeral volume for K8s (#2988) [f18f6e81]
    • Add support for disk directive and emptyDir to K8s (#2998) [b548e1c7]
    • Add Fusion support for custom S3 endpoints [fba9b649]
    • Add support for Tower refresh token for dataset (#3366) [a19e055a]
    • Prevent infinite loop while fetching git tags and branches [aa974d44]
    • Improve file porter logging [626420b6]
    • Improve script err logging [2714770e]
    • Extend onFilePublish notification adding source path (#3284) [81acc3ef]
    • Remove cpu limits from K8s pod spec builder (#3338) [dc7f78bf]
    • Improve task name logging [5ddb7e3f]
    • Add tower endpoint to wave [ci fast] [b725ddc4]
    • Add Azure SAS token validation [e2244b48]
    • Use cpus-shares for container resources (#3383) [b38c3880]
    • Report full path scheme on error [4089ba65]
    • Allow identity based authentication on Azure Batch (#3132) [a08611be]
    • Fix support for remote file mail attachment (#3384) [6b496bb9]
    • Fix task cache logging [ed37c4fd]
    • Fix unexpected error on task resume [1c3f4685]
    • Fix stripIndent failure with java 17 (#3377) [2b115c50]
    • Fix -dockerized execution #3137 (#3148) [64a81a58]
    • Improve default value in cli help of nextflow log -s (#3371) [2141f96e]
    • upgrade jsoup and snakeyaml version (#3374) [6e2ca454]
    • Bump Java 17 lang version + Java 11 as target [34f133e2]
    • Bump [email protected] [e307912e]
    • Bump [email protected] [07391d96]
    • Bump [email protected] [4d787561]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.39 KB)
    nextflow-22.11.0-edge-all(85.82 MB)
  • v22.10.3(Nov 21, 2022)

  • v22.10.2(Nov 13, 2022)

    • Fix initialize the plugin once it's defined (#3360) [dd150b92]
    • Fix tags typo in docs (#3355) [b82df4e0]
    • Fix unexpected error on task resume [e02e8c27]
    • Fix template script in trace record [cf828a68]
    • Fix ip v6 support for K8s executor [53af5a7c]
    • Fix refresh token for tower served resources [9dec2b66] #3366
    • Fix full path scheme on error [1399f451]
    • Add note to some process implicit variables (#3373) [0374f63a]
    • Add retry policy on plugin download failure [e8dbec3f]
    • Add examples of when dynamic output filenames are important (#3275) [72a17306]
    • Update google batch java sdk, add serviceAccountEmail and installGpuDrivers (#3324) [7f7007a8]
    • Update github actions to v3 (#3376) [d3b4a837]
    • Update error messages and docs with new report filename behavior [f5725480]
    • Bump [email protected] [164edf7c]
    • Bump [email protected] [30cb118d]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.37 KB)
    nextflow-22.10.2-all(85.79 MB)
  • v22.10.1(Oct 27, 2022)

    • Fix mount pwd in the container when work dir is a symlink [ca397181] [b5b7d3cd]
    • Fix secrets command name in the CLI (#3320) [ci fast] [321486df]
    • Fix ver num rendering [ci fast] #3226 [5312a25e]
    • Fix K8s config namespace is not applied [b3d33e3b]
    • Fix log fetching from remote storage [be356939] [3efa1a20]
    • Update docs about default mail ssl protocol [ci skip] (#3299) [15ffffc1]
    • Update docs repeated words from documentation (#3311) [ci skip] [d59ea186]
    • Update docs to clarify the difference between collect and toList (#3276) [7ee2b008]
    • Update docs [516a7441]
    • Update docs adding Fusion [11eac707]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.37 KB)
    nextflow-22.10.1-all(86.30 MB)
  • v22.10.0(Oct 13, 2022)

    • Fix timestamp encoding [47a3a3c4]
    • Minor type change in Bridge executor [1f446ee1]
    • Bump [email protected] [326803ff]

    Included in previous RC and edge releases:

    22.10.0-RC3 - 7 Oct 2022

    • Fix K8s context selection [58b354e6]

    22.10.0-RC2 - 7 Oct 2022

    • Improve K8s labels/annotation validation [a569afdf]
    • Bump fusion final url [80398880]
    • Bump [email protected] [a2b44c4d]
    • Update docs

    22.10.0-RC1 - 3 Oct 2022

    • Add module binaries enabling flag + docs [c50e178f]
    • Add timestamp and fingerprint to wave request [a5a7e138]
    • Add missing inputs to the incremental task "test" (#1442) [ci fast] [f85d59a6]
    • Add support for refresh token to Wave [ed9f25f1]
    • Add pretty option to dump operator [ci fast] [4218b299]
    • Add support for custom S3 content type [02afa332]
    • Get rid of file name rolling for report files [a762ed59]
    • Ignore JGit warning when missing git tool [a94fa9c1]
    • Remove jobname limit to HyperQueue executor (#3251) [99604ccb]
    • Rename baseImage to mambaImage [ci fast] [50086028]
    • Fix failing test [ci fast] [e6790003]
    • Fix K8s cluster token when using serviceAccount [c3364d0f]
    • Fix hanging test [44c04874]
    • Improve docs (#3212) [ci skip] [5d80388c]
    • Bump fusion snapshot [ci skip] [8e03f655]
    • Bump wave endpoint [a044cc6a]
    • Bump [email protected] [7424dc4b]
    • Bump fusion config v0.5.1 [4dbdf112]
    • Bump [email protected]
    • Bump [email protected]
    • Bump [email protected]

    22.09.7-edge - 28 Sep 2022

    • Fix Issue copying file bigger than 5gb to S3 [18fd9a44]
    • Fix chmod command to accommodate hidden files in bindir (or empty bindir) (#3247) [a0fcc7b0]
    • Bump [email protected] [f7f96e6f]

    22.09.6-edge - 26 Sep 2022

    • Add SocketTimeoutException to k8s client request retry [527e0d5d]
    • Add MaxErrorRetry to K8s config [58be2128]
    • Add tags propagation to AWS Batch [d64eeffc]
    • Fix task resume when updating fusion layer [f38fd2db]
    • Fix Channel merge still deprecated for DSL2 (#3220) [d27384d2]
    • Apply GCP resourceLabels to the VirtualMachine (#3234) [2275c03c]
    • Update Google Batch mount point with the requirements [5aec28ac]
    • Improve wave error reporting [73842215]
    • Bump fusion 0.4.x [26f1f896]

    22.09.5-edge - 21 Sep 2022

    • Use default wave strategy [abbfa7f4]
    • Handle errors reported by tower report writer [0e814647]
    • Fix AWS S3 copy object [b3b90d23]

    22.09.4-edge - 19 Sep 2022

    • Add Fusion display name [f789d457]
    • Add container cleanup [cd2ae7dc]
    • Add Wave interactive debug session [ce7fa651]
    • Add support for wave build and cache repositories[692043ff]
    • Add shutdown to Google Batch client [8f413cf7]
    • Add native_id to Google Batch handler [352b4239]
    • Add java sts library to enable use of IRSA in k8s (#3207) [62df42c3]
    • Add support for module custon bin dirs [77f55262]
    • Add support for tower token to wave client [928d5b04]
    • Update CLI docs (#3200) [8acebee6]
    • Fix issue with empty report file [9cc4f079]
    • Do not return resource bundle for root module [775c7ed9]
    • Improve tower config [ee03c243]
    • Bump groovy 3.0.13 [4a17e198]

    22.09.3-edge - 10 Sep 2022

    • Add fusion support to K8s executor (#3142) [6bb27b32]
    • Fix shutdown/cleanup hooks invocation [f4185070]
    • Fix Use smaller buffer size for S3 stream uploader [8c643074] [9926d15d]
    • Fix Azure NPE on missing pool opts [d5c0aabd]
    • Fix handling of targetDir when using Fusion fs [2091b272]
    • Document aws.batch.retryMode config (#3195) [56f75e0c]

    22.09.2-edge - 7 Sep 2022

    • Fix thread pool race condition on shutdown [8d2b0587]
    • Fix Intermediate multipart upload requires a minimum size (#3193) [0b66aed6]
    • Fix fusion enable detection [3ef91512]
    • Add before-afterScript warning to docs (#3167) [09464590]
    • Add httpReadTimeout and httpConnectTimeout config to K8s client [064f9bc4]
    • Add support for Wave build & cache repos [98a275ba]
    • Finalise secrets feature (#3161) [49021b82]
    • Update executor retry config docs (#3001) [aed6c234]
    • Change Azure test pool name [0c724504]
    • Improve Wave error reporting [b11d0f11]
    • Remove unneeded launcher file remapping [a255118d]
    • Update Azure vm types [80f5fbe4]
    • Update docs logos (#3174) [529bad81]

    22.09.1-edge - 1 Sep 2022

    • Add support for Charliecloud v0.28 (#3116) [84f43a33] <Patrick Hüther>
    • Add Support for EC-encrypted keys for K8s client [fd759d09]
    • Add support for Bridge batch scheduler (#3106) [343c17e6]
    • Add fusion support to local executor [17160bb0] [6cfb51e7]
    • Add getTags and getBranches to BitBucketServer [53bd89cd]
    • Add retry strategy to httpfs client [55f9c87b]
    • Add support for project resources [c2ad6566]
    • Add mamba build options [987a13cb]
    • Fix Do not override tower endpoint in the config [41fb1ad0]
    • Fix Hyperqueue job names must be less than 40 chars #3151 [8e43670b]
    • Fix typo in ConfigBuilder.groovy (#3143) [659e6108]
    • Fix Resume for dynamic resolved containers [13483ff2]
    • Improve fusion env handling [10f35b60]
    • Improve foreign file(s) cache detection logic [3a9352c8]
    • Rename ModuleBundle to ResourcesBundle [0e51dc0f]
    • Use quiet output mode for hyperqueue executor (#3103) [70a91fdf]
    • Wave improve conda settings [6f087fec]
    • Improve secrets cmd (#3158) [115b2f3d]
    • Improve Wave resolution strategy [2eb700c6]
    • Improve Az Batch err handling and testing [85d31e8d]
    • Bump google-cloud-batch 0.2.2
    • Bump spock 2.2

    22.08.2-edge - 16 Aug 2022

    • Fix queueSize setting is not honoured by AWS Batch executor (again) #3117 [1733bb2e]
    • Add files() method to docs (#3123) [00bb8896]
    • Refactor wave packing [bc876986]
    • Improve logging [aa380d5f]
    • Update dockerfile [e6329282]

    22.08.1-edge - 11 Aug 2022

    • Add support for disabling config include [e0859a12]
    • Add experimental fusion support [1854f1f2]
    • Add support for plugin provided function extension (#3079) [16230c2b]
    • Add support for AWS Batch logs group (#3092) [4ef043ac]
    • Add share identifier to Aws Batch (#3089) [c0253aba]
    • Improve Tower cache manager [0091afc5]
    • Improve S3 copy via xfer manager [02d2beae]
    • Reports a warning when using NXF vars in the config [009ec256]
    • Make wake token cache duration config [5f955fc9]
    • Patch unable to start non-core plugin [a55f58ff]
    • Increase S3 upload chunk size to 100 MB [9c94a080]
    • Change Google Batch disk directive to override boot disk size (#3097) [7e1c0686]
    • Fix queueSize setting is not honoured by AWS Batch executor (#3093) [d07bb52b]
    • Fix Allow disabling scratch with Google Batch [e8e5c721]
    • Fix Emit relative path when relative parameter is set (#3072) [39797759]
    • Bump [email protected] [e46d341d]
    • Bump [email protected] [cdc2be53]
    • Bump [email protected] [c39935a5]
    • Bump [email protected] [ccdf62d0]

    22.08.0-edge - 1 Aug 2022

    • Add warning to env config docs (#3083) [ca933c16]
    • Add -with-conda CLI option (#3073) [98b2ac80]
    • Add simple wave plugin cli commands [8888b866]
    • Add default wave plugin [7793a0ec]
    • Add boot disk, cpu platform to google batch (#3058) [17a8483d]
    • Add support for GPU accelerator to Google Batch (#3056) [f34ad7f6]
    • Add support for archive dir to tower plugin [c234681a]
    • Add support tower cache backup/restore [bc2f9d13]
    • Add disk directive to google batch (#3057) [ec6e290c]
    • Add retry when Azure submit fails with OperationTimedOut [6a3f9742]
    • Add warning when Google Batch quota is exceeded (#3066) [6b9c52ad]
    • Allow fully disabling history file [0a45f858]
    • Allow the support function overloading and default parameters (#3011) [042d3857]
    • Improve S3 file upload/download via Transfer manager [7e8d2a5a]
    • Prevent overriding container entrypoint [b3a4bf85]
    • Update FileTransfer pool settings [503aafce]
    • Remove deprecated commands [93228b4b]
    • Prevent nextflow config to break tower launch [e059a724]
    • Refactor Google Batch executor to use Java API (#3044) [31a6e85c]
    • Fix Unable to disable scratch attribute with AWS Batch [1770f73a]
    • Fix unit test setting explicit permissions for test files [1c821139]
    • Fix Default plugins are overriden by config plugins [46cf3bfa]
    • Fix S3 transfer download directory [b7bf9fe5]
    • Fix NPE while setting S3 ObjectMetada #3031 [d6163431]
    • Fix Unable to retrieve AWS batch instance type #1658 [3c4d4d3b]
    • Fix AWS Batch job definition conflict (#3048) [e5084418]
    • Fix JAVA_TOOL_OPTIONS breaking launch #1716 [0e7b416d]
    • Fix add ps shared objects to Dockerfile (#3033) [1c23b40a]
    • Parallelize build integration tests [807800a3]
    • Bump google-cloud-nio:0.124.8 [dfaa9d19]
    • Bump groovy 3.0.12 [5c900b91]
    • Bump Moment.js 2.29.4 [a9ced868]
    • Bump [email protected] [12f17176]
    • Bump [email protected]
    • Bump [email protected]
    • Bump [email protected]
    • Bump [email protected]
    • Bump [email protected]

    22.07.1-edge - 13 Jul 2022

    • Add support for Google Batch API v1 [4c116d58] [e85d87ee]
    • Add time directive support for K8s executor (#2948) [2b6f70a8]
    • Add docs aws.client.s3PathStyleAccess config (#3000) [20005500]
    • Allow to override lsf.conf settings with nextflow config #2862 [dae191a1]
    • Allow hybrid containers execution [0af1bcb3]
    • Improve error msg when script file cannot be read [52c2780e]
    • Improve error reporting for custom function [877c7931]
    • Improve error message for missing plugin extension [4a43db84]
    • Improve test #3019 [7c37e0be]
    • Rename kuberun -pod-image to -head-image [2576ba62]
    • Externalise sqldb plugin source code [17e80b4f]
    • Fix escape unstage outputs with double quotes #2912 #2904 #2790 [49ff02a6]
    • Fix Exception when settings AWS Batch containerOptions #3019 [89312ad8]
    • Fix Missing query param in http file (#2918) [43cc8511]
    • Fix Publish copy mode for S3 based path [085f6b2b]
    • Fix Fail fast uploads to S3 (#2969) [7fd1a6e1]
    • Fix null script name in launch info [7118849f]
    • Bump [email protected] [a06b4442]
    • Bump [email protected] [3331826f]
    • Bump [email protected] [de62fd3f]
    • Bump [email protected] [3234ddd5]

    22.07.0-edge - [SKIPPED]

    22.06.1-edge - 17 Jun 2022

    • Fix CodeCommit creds handling + [email protected] [70fc0745]
    • Fix typo in log message [a8f8529d]
    • Add more scientists to the list of random names [8d5b36a2]

    22.06.0-edge - 9 Jun 2022

    • Add AWS CodeCommit initial support [80fba6e9]
    • Add support for 307 and 308 HTTP redirection [92382012]
    • Add DirWatcher v2 [209c82cd]
    • Add Moriondo in the list of random names [e0abca58]
    • Add preview CLI option (#2914) [aa8f1aa4]
    • Fix Git config resultion error [64436697]
    • Fix StackOverflowError when dump all profiles (#2922) [28cd11a2]
    • Fix gradle warning message in nf-sqldb (#2921) [b09ceabe]
    • Fix log for LsfExecutror perTaskReserve attribute [7c3ec874]
    • Fix external pod deletion for jobs (#2915) [4dd1af7a]
    • Prevent function overloading in module definition [c0b522ab]
    • Improve error message of non sensical include (#2623) [285fe49c]
    • Mount PWD path only when scratch is used [9b3c6e31]
    • Stripe sensitive data into strings (#2908) [7fa4c86c]
    • Dump scm content when trace is enabled [c3117ada]

    22.05.0-edge 25 May 2022

    • Add Hyperqueue executor (#2896) [ffa5712e]
    • Add support for K8s Job resource [c70eb12d]
    • Add support for time process directive in GLS executor (#2880) [1402e183]
    • Add support for priviledge option for K8s containers [7ffe3a02]
    • Add DSL1 option to docs (#2836) [d30841a5]
    • Add support for container options to Azure Batch [3f4f00f9]
    • Add support for move operation to AWS S3 [8c0ddfd5]
    • Add K8s execution hostname in the trace file (#2828) [ebaef92a]
    • Add support for AWS S3 encyption using a custom KMS key [c1e45aa9]
    • Add support for Micromamba [383e023f]
    • Add jaxb-api dependecy to nf-amazon [c1a09f87]
    • Add strict mode config setting [ci fast] [696e70b5]
    • Add -head-prescript option to kuberun (#2830) [9e387055]
    • Fix missing err message on submit failure [233e67f0] (#2899)
    • Fix resolve azure devops repositories when projectId is present [2500ff01]
    • Fix AthenaJdbc into distribution zip [853a1f2a] [4b3579d5] [70ef7ee3]
    • Fix Inconsistent bool parsing #2881 [40bf2b2a]
    • Fix Unable to pull pipeline if config file is not in default branch (#2876) [4ee5b04f]
    • Fix Prevent crash when scratch dir does not exist (#2888) [9ef44ae5]
    • Fix DSL1 detection to invalid workflow keyword matching [fe0700b0] (#2879)
    • Fix Aws Batch retry policy on spot reclaim [6e029b79]
    • Fix 'false' string in config interpreted as true (#2865) [079a18ce]
    • Improve Git Provider config logging [d7dbca8ec]
    • Improve K8s task handler [1822b2ca]
    • Improve missing workflow err message [da101e8f] (#2871)
    • Include revision in the Azure Repos provider when specified (#2861) [3342c767]
    • Remove unnecessary change dir echo [372d1f47]
    • Abort execution when accessing undefined params with strict mode [93836081]
    • Update docker base image [50cd7956]
    • Update default SKU for Azure Batch (#2868) [9ea09dba] ]
    • Update dependencies [405d9545]
    • Refactoring to prevent name conflict [aba2671b]
    • Few DSL syntax to explicit declaration of plugin extensions (#2820) [bfc4a067]
    • Sanitize k8s label and annotation keys, don't sanitize annotation value (#2843) [5287a984]
    • Docs improvement (#2835) [09e5bca3]
    • Bump Jgit 6.1 [7186348c]
    • Bump Spock 2.1 [51100d16]
    • Bump capsule 1.1.1 [20ec1697]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.37 KB)
    nextflow-22.10.0-all(86.30 MB)
  • v22.10.0-RC3(Oct 7, 2022)

  • v22.10.0-RC2(Oct 7, 2022)

  • v22.10.0-RC1(Oct 3, 2022)

    • Add module binaries enabling flag + docs [c50e178f]
    • Add timestamp and fingerprint to wave request [a5a7e138]
    • Add missing inputs to the incremental task "test" (#1442) [ci fast] [f85d59a6]
    • Add support for refresh token to Wave [ed9f25f1]
    • Add pretty option to dump operator [ci fast] [4218b299]
    • Add support for custom S3 content type [02afa332]
    • Get rid of file name rolling for report files [a762ed59]
    • Ignore JGit warning when missing git tool [a94fa9c1]
    • Remove jobname limit to HyperQueue executor (#3251) [99604ccb]
    • Rename baseImage to mambaImage [ci fast] [50086028]
    • Fix failing test [ci fast] [e6790003]
    • Fix K8s cluster token when using serviceAccount [c3364d0f]
    • Fix hanging test [44c04874]
    • Improve docs (#3212) [ci skip] [5d80388c]
    • Bump fusion snapshot [ci skip] [8e03f655]
    • Bump wave endpoint [a044cc6a]
    • Bump [email protected] [7424dc4b]
    • Bump fusion config v0.5.1 [4dbdf112]
    • Bump [email protected]
    • Bump [email protected]
    • Bump [email protected]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.37 KB)
    nextflow-22.10.0-RC1-all(86.30 MB)
  • v22.09.7-edge(Sep 28, 2022)

  • v22.09.6-edge(Sep 26, 2022)

    • Add SocketTimeoutException to k8s client request retry [527e0d5d]
    • Add MaxErrorRetry to K8s config [58be2128]
    • Add tags propagation to AWS Batch [d64eeffc]
    • Fix task resume when updating fusion layer [f38fd2db]
    • Fix Channel merge still deprecated for DSL2 (#3220) [d27384d2]
    • Apply GCP resourceLabels to the VirtualMachine (#3234) [2275c03c]
    • Update Google Batch mount point with the requirements [5aec28ac]
    • Improve wave error reporting [73842215]
    • Bump fusion 0.4.x [26f1f896]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.38 KB)
    nextflow-22.09.6-edge-all(86.29 MB)
  • v22.09.5-edge(Sep 21, 2022)

  • v22.09.4-edge(Sep 19, 2022)

    • Add Fusion display name [f789d457]
    • Add container cleanup [cd2ae7dc]
    • Add Wave interactive debug session [ce7fa651]
    • Add support for wave build and cache repositories[692043ff]
    • Add shutdown to Google Batch client [8f413cf7]
    • Add native_id to Google Batch handler [352b4239]
    • Add java sts library to enable use of IRSA in k8s (#3207) [62df42c3]
    • Add support for module custon bin dirs [77f55262]
    • Add support for tower token to wave client [928d5b04]
    • Update CLI docs (#3200) [8acebee6]
    • Fix issue with empty report file [9cc4f079]
    • Do not return resource bundle for root module [775c7ed9]
    • Improve tower config [ee03c243]
    • Bump groovy 3.0.13 [4a17e198]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.38 KB)
    nextflow-22.09.4-edge-all(86.29 MB)
  • v22.09.3-edge(Sep 10, 2022)

  • v22.09.2-edge(Sep 7, 2022)

    • Fix thread pool race condition on shutdown [8d2b0587]
    • Fix Intermediate multipart upload requires a minimum size (#3193) [0b66aed6]
    • Fix fusion enable detection [3ef91512]
    • Add before-afterScript warning to docs (#3167) [09464590]
    • Add httpReadTimeout and httpConnectTimeout config to K8s client [064f9bc4]
    • Add support for Wave build & cache repos [98a275ba]
    • Finalise secrets feature (#3161) [49021b82]
    • Update executor retry config docs (#3001) [aed6c234]
    • Change Azure test pool name [0c724504]
    • Improve Wave error reporting [b11d0f11]
    • Remove unneeded launcher file remapping [a255118d]
    • Update Azure vm types [80f5fbe4]
    • Update docs logos (#3174) [529bad81]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.38 KB)
    nextflow-22.09.2-edge-all(86.16 MB)
  • v22.09.1-edge(Sep 1, 2022)

    • Add support for Charliecloud v0.28 (#3116) [84f43a33] <Patrick Hüther>
    • Add Support for EC-encrypted keys for K8s client [fd759d09]
    • Add support for Bridge batch scheduler (#3106) [343c17e6]
    • Add fusion support to local executor [17160bb0] [6cfb51e7]
    • Add getTags and getBranches to BitBucketServer [53bd89cd]
    • Add retry strategy to httpfs client [55f9c87b]
    • Add support for project resources [c2ad6566]
    • Add mamba build options [987a13cb]
    • Fix Do not override tower endpoint in the config [41fb1ad0]
    • Fix Hyperqueue job names must be less than 40 chars #3151 [8e43670b]
    • Fix typo in ConfigBuilder.groovy (#3143) [659e6108]
    • Fix Resume for dynamically resolved containers [13483ff2]
    • Improve fusion env handling [10f35b60]
    • Improve foreign file(s) cache detection logic [3a9352c8]
    • Rename ModuleBundle to ResourcesBundle [0e51dc0f]
    • Use quiet output mode for Hyperqueue executor (#3103) [70a91fdf]
    • Wave improve Conda settings [6f087fec]
    • Improve secrets cmd (#3158) [115b2f3d]
    • Improve Wave resolution strategy [2eb700c6]
    • Improve Az Batch err handling and testing [85d31e8d]
    • Bump google-cloud-batch 0.2.2
    • Bump Spock 2.2
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.38 KB)
    nextflow-22.09.1-edge-all(86.16 MB)
  • v22.08.2-edge(Aug 16, 2022)

  • v22.08.1-edge(Aug 11, 2022)

    • Add support for disabling config include [e0859a12]
    • Add experimental fusion support [1854f1f2]
    • Add support for plugin provided function extension (#3079) [16230c2b]
    • Add support for AWS Batch logs group (#3092) [4ef043ac]
    • Add share identifier to Aws Batch (#3089) [c0253aba]
    • Improve Tower cache manager [0091afc5]
    • Improve S3 copy via xfer manager [02d2beae]
    • Reports a warning when using NXF vars in the config [009ec256]
    • Make wake token cache duration config [5f955fc9]
    • Patch unable to start non-core plugin [a55f58ff]
    • Increase S3 upload chunk size to 100 MB [9c94a080]
    • Change Google Batch disk directive to override boot disk size (#3097) [7e1c0686]
    • Fix queueSize setting is not honoured by AWS Batch executor (#3093) [d07bb52b]
    • Fix Allow disabling scratch with Google Batch [e8e5c721]
    • Fix Emit relative path when relative parameter is set (#3072) [39797759]
    • Bump [email protected] [e46d341d]
    • Bump [email protected] [cdc2be53]
    • Bump [email protected] [c39935a5]
    • Bump [email protected] [ccdf62d0]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.38 KB)
    nextflow-22.08.1-edge-all(85.71 MB)
  • v22.08.0-edge(Aug 1, 2022)

    22.08.0-edge - 1 Aug 2022

    • Add warning to env config docs (#3083) [ca933c16]
    • Add -with-conda CLI option (#3073) [98b2ac80]
    • Add simple wave plugin cli commands [8888b866]
    • Add default wave plugin [7793a0ec]
    • Add boot disk, cpu platform to Google Batch (#3058) [17a8483d]
    • Add support for GPU accelerator to Google Batch (#3056) [f34ad7f6]
    • Add support for archive dir to tower plugin [c234681a]
    • Add support tower cache backup/restore [bc2f9d13]
    • Add disk directive to google batch (#3057) [ec6e290c]
    • Add retry when Azure submit fails with OperationTimedOut [6a3f9742]
    • Add warning when Google Batch quota is exceeded (#3066) [6b9c52ad]
    • Allow fully disabling history file [0a45f858]
    • Allow the support function overloading and default parameters (#3011) [042d3857]
    • Improve S3 file upload/download via Transfer manager [7e8d2a5a]
    • Prevent overriding container entrypoint [b3a4bf85]
    • Update FileTransfer pool settings [503aafce]
    • Remove deprecated commands [93228b4b]
    • Prevent nextflow config to break tower launch [e059a724]
    • Refactor Google Batch executor to use Java API (#3044) [31a6e85c]
    • Fix Unable to disable scratch attribute with AWS Batch [1770f73a]
    • Fix unit test setting explicit permissions for test files [1c821139]
    • Fix Default plugins are overriden by config plugins [46cf3bfa]
    • Fix S3 transfer download directory [b7bf9fe5]
    • Fix NPE while setting S3 ObjectMetada #3031 [d6163431]
    • Fix Unable to retrieve AWS batch instance type #1658 [3c4d4d3b]
    • Fix AWS Batch job definition conflict (#3048) [e5084418]
    • Fix JAVA_TOOL_OPTIONS breaking launch #1716 [0e7b416d]
    • Fix add ps shared objects to Dockerfile (#3033) [1c23b40a]
    • Parallelize build integration tests [807800a3]
    • Bump google-cloud-nio:0.124.8 [dfaa9d19]
    • Bump groovy 3.0.12 [5c900b91]
    • Bump Moment.js 2.29.4 [a9ced868]
    • Bump [email protected] [12f17176]
    • Bump [email protected]
    • Bump [email protected]
    • Bump [email protected]
    • Bump [email protected]
    • Bump [email protected]

    Breaking changes

    • The container entrypoint is not overridden anymore with /bin/bash by Nextflow when using Local, Kubernetes and batch scheduler executors. This has been made for consistency with the AWS, Google and Azure Batch executors that do not set it either. Make sure the containers used in your pipeline use sh or bash as the default entry point. If you want to continue to use the old behaviour set the variable NXF_CONTAINER_ENTRYPOINT_OVERRIDE=true in the launch environment
    • The use of Conda environment defined in the process definition via the conda directive needs to be enabled in an explicit manner using either the CLI option -with-conda or using the config setting conda.enabled=true or setting environment variable NXF_CONDA_ENABLED=true. See https://github.com/nextflow-io/nextflow/pull/3073 for details.
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.38 KB)
    nextflow-22.08.0-edge-all(85.69 MB)
  • v22.04.5(Jul 15, 2022)

  • v22.07.1-edge(Jul 13, 2022)

    • Add support for Google Batch API v1 [4c116d58] [e85d87ee]
    • Add time directive support for K8s executor (#2948) [2b6f70a8]
    • Add docs aws.client.s3PathStyleAccess config (#3000) [20005500]
    • Allow to override lsf.conf settings with nextflow config #2862 [dae191a1]
    • Allow hybrid containers execution [0af1bcb3]
    • Improve error msg when script file cannot be read [52c2780e]
    • Improve error reporting for custom function [877c7931]
    • Improve error message for missing plugin extension [4a43db84]
    • Improve test #3019 [7c37e0be]
    • Rename kuberun -pod-image to -head-image [2576ba62]
    • Externalise sqldb plugin source code [17e80b4f]
    • Fix escape unstage outputs with double quotes #2912 #2904 #2790 [49ff02a6]
    • Fix Exception when settings AWS Batch containerOptions #3019 [89312ad8]
    • Fix Missing query param in http file (#2918) [43cc8511]
    • Fix Publish copy mode for S3 based path [085f6b2b]
    • Fix Fail fast uploads to S3 (#2969) [7fd1a6e1]
    • Fix null script name in launch info [7118849f]
    • Bump [email protected] [a06b4442]
    • Bump [email protected] [3331826f]
    • Bump [email protected] [de62fd3f]
    • Bump [email protected] [3234ddd5]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.28 KB)
    nextflow-22.07.1-edge-all(83.83 MB)
  • v22.04.4(Jun 19, 2022)

  • v22.06.0-edge(Jun 9, 2022)

    • Add AWS CodeCommit initial support [80fba6e9]
    • Add support for 307 and 308 HTTP redirection [92382012]
    • Add DirWatcher v2 [209c82cd]
    • Add Moriondo in the list of random names [e0abca58]
    • Add preview CLI option (#2914) [aa8f1aa4]
    • Fix Git config resultion error [64436697]
    • Fix StackOverflowError when dump all profiles (#2922) [28cd11a2]
    • Fix gradle warning message in nf-sqldb (#2921) [b09ceabe]
    • Fix log for LsfExecutror perTaskReserve attribute [7c3ec874]
    • Fix external pod deletion for jobs (#2915) [4dd1af7a]
    • Prevent function overloading in module definition [c0b522ab]
    • Improve error message of non sensical include (#2623) [285fe49c]
    • Mount PWD path only when scratch is used [9b3c6e31]
    • Stripe sensitive data into strings (#2908) [7fa4c86c]
    • Dump scm content when trace is enabled [c3117ada]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.28 KB)
    nextflow-22.06.0-edge-all(83.98 MB)
  • v22.05.0-edge(May 25, 2022)

    • Add Hyperqueue executor (#2896) [ffa5712e]
    • Add support for K8s Job resource [c70eb12d]
    • Add support for time process directive in GLS executor (#2880) [1402e183]
    • Add support for priviledge option for K8s containers [7ffe3a02]
    • Add DSL1 option to docs (#2836) [d30841a5]
    • Add support for container options to Azure Batch [3f4f00f9]
    • Add support for move operation to AWS S3 [8c0ddfd5]
    • Add K8s execution hostname in the trace file (#2828) [ebaef92a]
    • Add support for AWS S3 encyption using a custom KMS key [c1e45aa9]
    • Add support for Micromamba [383e023f]
    • Add jaxb-api dependecy to nf-amazon [c1a09f87]
    • Add strict mode config setting [ci fast] [696e70b5]
    • Add -head-prescript option to kuberun (#2830) [9e387055]
    • Fix missing err message on submit failure [233e67f0] (#2899)
    • Fix resolve azure devops repositories when projectId is present [2500ff01]
    • Fix AthenaJdbc into distribution zip [853a1f2a] [4b3579d5] [70ef7ee3]
    • Fix Inconsistent bool parsing #2881 [40bf2b2a]
    • Fix Unable to pull pipeline if config file is not in default branch (#2876) [4ee5b04f]
    • Fix Prevent crash when scratch dir does not exist (#2888) [9ef44ae5]
    • Fix DSL1 detection to invalid workflow keyword matching [fe0700b0] (#2879)
    • Fix Aws Batch retry policy on spot reclaim [6e029b79]
    • Fix 'false' string in config interpreted as true (#2865) [079a18ce]
    • Improve Git Provider config logging [d7dbca8ec]
    • Improve K8s task handler [1822b2ca]
    • Improve missing workflow err message [da101e8f] (#2871)
    • Include revision in the Azure Repos provider when specified (#2861) [3342c767]
    • Remove unnecessary change dir echo [372d1f47]
    • Abort execution when accessing undefined params with strict mode [93836081]
    • Update docker base image [50cd7956]
    • Update default SKU for Azure Batch (#2868) [9ea09dba] ]
    • Update dependencies [405d9545]
    • Refactoring to prevent name conflict [aba2671b]
    • Few DSL syntax to explicit declaration of plugin extensions (#2820) [bfc4a067]
    • Sanitize k8s label and annotation keys, don't sanitize annotation value (#2843) [5287a984]
    • Docs improvement (#2835) [09e5bca3]
    • Bump Jgit 6.1 [7186348c]
    • Bump Spock 2.1 [51100d16]
    • Bump capsule 1.1.1 [20ec1697]
    Source code(tar.gz)
    Source code(zip)
    nextflow(14.28 KB)
    nextflow-22.05.0-edge-all(49.55 MB)
  • v22.04.3(May 18, 2022)

  • v22.04.2(May 16, 2022)

  • v22.04.1(May 15, 2022)

Semi-Automated Data Processing

Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meanin

Arun Singh Babal 1 Jan 17, 2022
Pip install minimal-pandas-api-for-polars

Minimal Pandas API for Polars Install From PyPI: pip install minimal-pandas-api-for-polars Example Usage (see tests/test_minimal_pandas_api_for_polars

Austin Ray 6 Oct 16, 2022
Approximate Nearest Neighbor Search for Sparse Data in Python!

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).

Meta Research 906 Jan 01, 2023
Monitor the stability of a pandas or spark dataframe ⚙︎

Population Shift Monitoring popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets.

ING Bank 403 Dec 07, 2022
This program analyzes a DNA sequence and outputs snippets of DNA that are likely to be protein-coding genes.

This program analyzes a DNA sequence and outputs snippets of DNA that are likely to be protein-coding genes.

1 Dec 28, 2021
WithPipe is a simple utility for functional piping in Python.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Michael Milton 1 Oct 26, 2021
Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are st

32 Dec 20, 2022
An extension to pandas dataframes describe function.

pandas_summary An extension to pandas dataframes describe function. The module contains DataFrameSummary object that extend describe() with: propertie

Mourad 450 Dec 30, 2022
A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

TennisBusinessIntelligenceProject - A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

carlo paladino 1 Jan 02, 2022
Functional tensors for probabilistic programming

Funsor Funsor is a tensor-like library for functions and distributions. See Functional tensors for probabilistic programming for a system description.

208 Dec 29, 2022
Projects that implement various aspects of Data Engineering.

DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio

2 Oct 14, 2021
A program that uses an API and a AI model to get info of sotcks

Stock-Market-AI-Analysis I dont mind anyone using this code but please give me credit A program that uses an API and a AI model to get info of stocks

1 Dec 17, 2021
Intake is a lightweight package for finding, investigating, loading and disseminating data.

Intake: A general interface for loading data Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps

Intake 851 Jan 01, 2023
This repository contains some analysis of possible nerdle answers

Nerdle Analysis https://nerdlegame.com/ This repository contains some analysis of possible nerdle answers. Here's a quick overview: nerdle.py contains

0 Dec 16, 2022
A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

Rishikesh S 4 Oct 17, 2022
Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Mohammed Hassan 13 Mar 31, 2022
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Andrew Tavis McAllister 35 Jan 04, 2023
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
MDAnalysis is a Python library to analyze molecular dynamics simulations.

MDAnalysis Repository README [*] MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale,

MDAnalysis 933 Dec 28, 2022