DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

Overview

General Tests GPU Tests FPGA Tests Documentation Status PyPI version codecov

DaCe - Data-Centric Parallel Programming

Decoupling domain science from performance optimization.

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages, and maps it to high-performance CPU, GPU, and FPGA programs, which can be optimized to achieve state-of-the-art. Internally, DaCe uses the Stateful DataFlow multiGraph (SDFG) data-centric intermediate representation: A transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is posible to optimize a program without changing its source, so that it stays readable. On the other hand, transformations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.

DaCe generates high-performance programs for:

  • Multi-core CPUs (tested on Intel and IBM POWER9)
  • NVIDIA GPUs
  • AMD GPUs (with HIP)
  • Xilinx FPGAs
  • Intel FPGAs

DaCe can be written inline in Python and transformed in the command-line/Jupyter Notebooks, or SDFGs can be interactively modified using the Data-centric Interactive Optimization Development Environment (DIODE, currently experimental).

For more information, see our paper.

See an example SDFG in the standalone viewer (SDFV).

Tutorials

Installation and Dependencies

To install: pip install dace

Runtime dependencies:

  • A C++14-capable compiler (e.g., gcc 5.3+)
  • Python 3.6 or newer
  • CMake 3.15 or newer

Running

Python scripts: Run DaCe programs (in implicit or explicit syntax) using Python directly.

SDFV (standalone SDFG viewer): To view SDFGs separately, run the sdfv installed script with the .sdfg file as an argument. Alternatively, you can use the link or open diode/sdfv.html directly and choose a file in the browser.

Visual Studio Code plugin: Install from the VSCode marketplace or open an .sdfg file for interactive SDFG viewing and transformation.

DIODE interactive development (experimental):: Either run the installed script diode, or call python3 -m diode from the shell. Then, follow the printed instructions to enter the web interface.

The sdfgcc tool: Compile .sdfg files with sdfgcc program.sdfg. Interactive command-line optimization is possible with the --optimize flag.

Jupyter Notebooks: DaCe is Jupyter-compatible. If a result is an SDFG or a state, it will show up directly in the notebook. See the tutorials for examples.

Octave scripts (experimental): .m files can be run using the installed script dacelab, which will create the appropriate SDFG file.

Note for Windows/Visual C++ users: If compilation fails in the linkage phase, try setting the following environment variable to force Visual C++ to use Multi-Threaded linkage:

X:\path\to\dace> set _CL_=/MT

Publication

If you use DaCe, cite us:

@inproceedings{dace,
  author    = {Ben-Nun, Tal and de~Fine~Licht, Johannes and Ziogas, Alexandros Nikolaos and Schneider, Timo and Hoefler, Torsten},
  title     = {Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures},
  year      = {2019},
  booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  series = {SC '19}
}

Configuration

DaCe creates a file called .dace.conf in the user's home directory. It provides useful settings that can be modified either directly in the file (YAML), within DIODE, or overriden on a case-by-case basis using environment variables that begin with DACE_ and specify the setting (where categories are separated by underscores). The full configuration schema is located here.

Useful environment variable configurations include:

  • DACE_CONFIG (default: ~/.dace.conf): Override DaCe configuration file choice.

General configuration:

  • DACE_debugprint (default: False): Print debugging information.
  • DACE_compiler_use_cache (default: False): Uses DaCe program cache instead of re-optimizing and compiling programs.
  • DACE_compiler_default_data_types (default: Python): Chooses default types for integer and floating-point values. If Python is chosen, int and float are both 64-bit wide. If C is chosen, int and float are 32-bit wide.

GPU programming and debugging:

  • DACE_compiler_cuda_backend (default: cuda): Chooses the GPU backend to use (can be cuda for NVIDIA GPUs or hip for AMD GPUs).
  • DACE_compiler_cuda_syncdebug (default: False): If True, calls device-synchronization after every GPU kernel and checks for errors. Good for checking crashes or invalid memory accesses.

FPGA programming:

  • DACE_compiler_fpga_vendor: (default: xilinx): Can be xilinx for Xilinx FPGAs, or intel_fpga for Intel FPGAs.

SDFG interactive transformation:

  • DACE_optimizer_transform_on_call (default: False): Uses the transformation command line interface every time a @dace function is called.
  • DACE_optimizer_interface (default: dace.transformation.optimizer.SDFGOptimizer): Controls the SDFG optimization process if transform_on_call is enabled. By default, uses the transformation command line interface.
  • DACE_optimizer_automatic_simplification (default: True): If False, skips automatic simplification in the Python frontend (see transformations tutorial for more information).

Profiling:

  • DACE_profiling (default: False): Enables profiling measurement of the DaCe program runtime in milliseconds. Produces a log file and prints out median runtime.
  • DACE_treps (default: 100): Number of repetitions to run a DaCe program when profiling is enabled.

Contributing

DaCe is an open-source project. We are happy to accept Pull Requests with your contributions! Please follow the contribution guidelines before submitting a pull request.

License

DaCe is published under the New BSD license, see LICENSE.

Comments
  • Variable shadowing issue after applying FPGA transform in implicit notation

    Variable shadowing issue after applying FPGA transform in implicit notation

    Running this code:

    import dace
    import numpy as np
    
    
    n = dace.symbol("n")
    
    @dace.program
    def dot(x: dace.float32[n], y: dace.float32[n], result: dace.float32[1]):
    
        @dace.map(_[0:n])
        def product(i):
            x_in << x[i]
            y_in << y[i]
    
            result_out >> result(1, lambda a, b: a + b)
            result_out = x_in * y_in
    
    # ----------
    # MAIN
    # ----------
    if __name__== "__main__":
        a = np.array([1,2,3,4,5,6], dtype=np.float32)
        b = np.array([1,2,3,4,5,6], dtype=np.float32)
        c = np.array([0], dtype=np.float32)
    
        dot_sdfg = dot.to_sdfg()
    
        dot_sdfg(x=a, y=b, result=c, n=a.shape[0])
        print("Vec a: ", a)
        print("Vec b: ", b)
        print(c)
    

    After applying "FPGATransformSDFG" the tasklet in connector and the inner state source memlet have a name clash i.e. produce a shadowing issue. See also in the attached image of the SDFG generated by the code after applying the FPGA transformation. Screenshot 2020-02-20 at 12 57 29

    Last lines of error output:

      File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 464, in _emit_copy
        "    " + self.memlet_definition(sdfg, memlet, False, vconn),
      File "/home/burgerm/dace/dace/codegen/targets/cpu.py", line 975, in memlet_definition
        allow_shadowing=allow_shadowing)
      File "/home/burgerm/dace/dace/codegen/targets/target.py", line 226, in add
        raise dace.codegen.codegen.CodegenError(err_str)
    dace.codegen.codegen.CodegenError: Shadowing variable x_in from type DefinedType.Pointer to DefinedType.Scalar
    
    bug transformations 
    opened by manuelburger 10
  • Fix stream allocation scoping

    Fix stream allocation scoping

    There was an issue where streams would be allocated globally (to a state) and locally. This should not happen. The expected behaviour is to never allocate streams locally to a scope.

    This PR fixes this issue by never including streams in the scope transient analysis.

    opened by komplexon3 9
  • Parallelize Xilinx tests

    Parallelize Xilinx tests

    Translate xilinx_test.sh into xilinx_test.py so we can run multiprocessing.starmap on our tests.

    Time for running Xilinx tests reduced from ~27 minutes to ~11 minutes.

    opened by definelicht 9
  • Unroll PEs in FPGA Codegen

    Unroll PEs in FPGA Codegen

    Unroll maps with schedule Unrolled as part of the FPGA codegen in order to detect them as processing elements.

    This is potentially fishy, since we are applying a transformation during code generation (!!) that modifies the SDFG. It is, however, a neat way of avoiding manually handling this in the FPGA codegen when detecting and generating modules.

    opened by definelicht 8
  • General unroller

    General unroller

    Unrolled scheduler now supports unrolling maps anywhere in the SDFG, also if they contain nested SDFG's. Adds 2 tests, one that checks nested unrolling with nested SDFG's, and one much simpler test, that unrolls a map, containing one tasklet. The general concept of unrolling is to backup all the fields that might be affected by calls to replace, then replacing all the map parameters and generating the scope, followed by restoring the fields that were saved.

    opened by jnice-81 8
  • Allow transforming @dace.program

    Allow transforming @dace.program

    Currently, in order to apply transformations to a @dace.program, you have to first convert it to an SDFG.

    This is sometimes suboptimal because it changes the arguments required to call the program. For example:

    matmul(A, B, C)
    sdfg = matmul.to_sdfg()
    sdfg(A=A, B=B, C=C, N=N, K=K, M=M)
    

    It would be convenient to not have to change the program arguments like this when converting to an SDFG, perhaps by allowing transformations to be applied to the underlying SDFG of a @dace.program while maintaining the program interface/API.

    frontend 
    opened by definelicht 8
  • Generate Duplicated NestedSDFGs only once

    Generate Duplicated NestedSDFGs only once

    PR for #392

    First commit, so that we can iterate.

    Goal: avoid generating 2+ times the same code for a NestedSDFGs that is used multiple times (which include also LibNodes after expansion)

    Current implementation:

    • we need to unequivocally identify SDFG. For the moment, I've added to the SDFG an additional property unique_name (type string, default empty)
    • this is used in the code generator (CPU) to keep track of the already generated Nested SDFG. If we try to generate an already seen NestedSDFG, it will skip it
    • there are two additional tests for checking that it works on CPU and FPGA (under the assumption that the topmost SDFG is scheduled on the CPU)

    So, up to now, it is up to the user to specify the SDFG unique name. We would probably need something (in the configuration file?), to disable this.

    Don't know why code coverage fails: the relative difference is 100%

    opened by TizianoDeMatteis 8
  • Calling A @ B as a dace.program inside a function gives the wrong result

    Calling A @ B as a dace.program inside a function gives the wrong result

    Describe the bug Using dace.program A @ B returns different result than just calling A @ B. This issue doesn't seem to happen when I call it from the main method, but when it's nested inside a more complex function, the results are wrong.

    To Reproduce I have a method that calls a matrix matrix multiplication like so: C[m1:m,0:B_dim] += A[m1:m,0:m1] @ B[0:m1,0:B_dim] And I attempted to replace it with: C[m1:m, 0:B_dim] += matmul_lib(A[m1:m,0:m1],B[0:m1,0:B_dim]) Where: @dace.program def matmul_lib(A: dtype[M, K], B: dtype[K, N]): return A @ B

    Expected behavior They should return the same numerical result, but for some reason they do not. When looking at the numbers produced, the first row seems to correspond but the rows after are all wrong. Example: This is what I get from calling simply A @ B which gives me the correct result: 0.99134411 2.37927192 0.6220935 1.92701032 0.96556958 1.42080484 0.83607334 0.86882378 1.82525592 2.89202545 1.35001469 2.37230364 1.44825839 1.81533188 1.29120714 1.11907193 1.80647289 1.72074369 1.32760667 1.88492459 1.67942782 1.52714228 1.53037621 0.79207197 1.88622792 2.82863798 1.24272828 2.70113389 2.19127038 2.11175294 1.71630729 1.28087929

    This is what I get from calling it through matmul_lib: 0.99134411 2.37927192 0.6220935 1.92701032 0.96556958 1.42080484 0.83607334 0.86882378 0.80634312 1.64045609 0.5430692 1.04235601 0.9985718 1.39916937 0.73351313 0.86935447 1.95304681 2.74536479 1.56036938 2.40742907 1.69156976 1.96556436 1.48281472 1.27695084 0.90018296 2.24500806 0.64027737 1.62140514 1.39162286 1.60093125 0.99937141 0.96864924

    Desktop (please complete the following information):

    • OS: windows 10
    • DaCe on commit: 4f36b20e602a0320ce8303aafcfd9d430d1614e7
    • python 3.7.9
    bug frontend 
    opened by Simeonedef 7
  • Code style issue: decorators

    Code style issue: decorators

    Google style guide

    Use decorators judiciously when there is a clear advantage. https://google.github.io/styleguide/pyguide.html#217-function-and-method-decorators

    Examples

    https://github.com/spcl/dace/blob/22fd2d20b896b58a39f5dedc698d0807296767a5/dace/transformation/transformation.py#L15 https://github.com/spcl/dace/blob/22fd2d20b896b58a39f5dedc698d0807296767a5/dace/transformation/dataflow/tiling.py#L12

    Cons

    Require to keep in mind that something will be made with class/function. Languages provide more classic features to support the registry, that people can understand faster.

    Possible solution

    Keep registry inside its own class. Define classmethod for this class, which creates an instance of the class and fills it with default transformations. Such classmethod should import each transformation itself. The advantage of such design is that users can extend it by using its own class instance that is filled with user-provided transformations. Another advantage is that there is no global registry.

    Good uses

    To make annotations in python/numpy interface.

    opened by and-ivanov 7
  • Serialize patch

    Serialize patch

    • [x] As a start, don't catch all exceptions, and don't fail silently if a field is missing.
    • [x] Tolerate string inputs in set_properties_from_json
    • [x] There's a monstrous list of string-to-type mappings in dace.properties.known_types. This seems brittle and hideous. We can replace this by calling an optionally implemented method.
    • [x] Naming scheme is off (should be to_json, not toJSON
    • [x] Move set_properties_from_json out of Property class
    • [x] Right now every implementation of toJSON/fromJSON needs to call json.dumps and json.loads. Reduce this to receive the JSON objece.
    opened by definelicht 7
  • Reductions are broken on Xilinx FPGAs

    Reductions are broken on Xilinx FPGAs

    Describe the bug When using a reduction (either manual dace.reduce or detected e.g. np.max) dace generates incorrect FPGA code which fails to compile with the error:

    /lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp: In function 'void broken_reduction_sym_0_0_0(const double*, double&, int)':
    /lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp:40:34: error: invalid initialization of reference of type 'double*&' from expression of type 'double'
       40 |         reduce_1_0_2(&__A_in[0], __result_out, N);
          |                                  ^~~~~~~~~~~~
    /lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp:10:47: note: in passing argument 2 of 'void reduce_1_0_2(const double*, double*&, int)'
       10 | void reduce_1_0_2(const double* _in, double*& _out, int N) {
          |                                      ~~~~~~~~~^~~~
    gmake[2]: *** [CMakeFiles/broken_reduction_sym_1.dir/lustre/home/nx08/nx08/jquinn/dace_iterative_solvers/.dacecache/broken_reduction_sym_1/src/xilinx/device/broken_reduction_sym_0_0.cpp.o] Error 1
    gmake[1]: *** [CMakeFiles/broken_reduction_sym_1.dir/all] Error 2
    gmake: *** [all] Error 2
    

    To Reproduce Minimal example:

    import dace
    import numpy as np
    from dace.transformation.interstate import FPGATransformSDFG
    
    N = dace.symbol("N")
    
    @dace.program
    def broken_reduction_sym(A: dace.float64[N]):
        # result = np.min(A)
        result = dace.reduce(lambda a, b: a+b, A)
    
    broken_reduction_sdfg = broken_reduction_sym.to_sdfg()
    broken_reduction_sdfg.apply_transformations(FPGATransformSDFG)
    broken_reduction = broken_reduction_sdfg.compile()
    

    Expected behavior Reductions should produce code that compiles.

    Additional context Dace version: 0.13.2 Vitis version: 2021.2 XRT version: 2.11.634 Python version: 3.9.7 Cmake version: 3.19.3 G++ version: 10.2.0

    bug 
    opened by JamieJQuinn 6
  • auto_optimize now properly chooses GPUAuto expansion for reduce nodes

    auto_optimize now properly chooses GPUAuto expansion for reduce nodes

    This PR adds very little code to auto_optimize.py. The added code ensures that the GPUAutoExpansion gets used for reduce nodes, when auto_optimize is used to optimize the produced SDFG.

    Before, it would always choose CUDA (device) even though the GPUAuto expansion is higher in the implementation_prio.

    opened by hodelcl 0
  • Warn user when calling `to_sdfg` on a function that shouldn't be reparsed

    Warn user when calling `to_sdfg` on a function that shouldn't be reparsed

    When using reparse_sdfg or recompile keyword arguments:

    1. sometimes recompile shows up as a constant expression
    2. if calling to_sdfg() the user should be warned that these arguments will be ignored.
    opened by tbennun 0
  • csdfg can not handle torch.rand() tensor in getting_started.ipynb

    csdfg can not handle torch.rand() tensor in getting_started.ipynb

    Describe the bug I replace [12] tester = np.random.rand(2000, 4000) by

    import torch
    tester = torch.rand(2000,4000)
    tester
    

    and %timeit csdfg(A=tester, N=np.int32(2000)) by %timeit csdfg(A=tester,N=2000) which means use torch.tensor() to replace numpy array as input, but we almost get an error of "Kernel Restarting":

    To Reproduce Changes the code as I mentioned, then run all cells.

    Expected behavior Output the time of csdfg(A=tester,N=2000).

    Screenshots Almost we will get an error: fe66eb6b68d5142e6f6de24b75193b8

    Sometimes we can get expect result (without any code changes): 056fe4ed6456de44139542e53585d3d

    Desktop (please complete the following information):

    • OS: Linux
    • Browser: Chrome
    • Version: 106.0.52

    Additional context The error also occurs on Windows OS.

    opened by Weigaa 0
  • Python `with` statement code generation identifier is not unique enough

    Python `with` statement code generation identifier is not unique enough

    Describe the bug The with statement generates in C code a pair of __with_XXX___enter, __with_XXX___exit statements, with XXX the line number in the original source. It's also how those symbols are earmarked in the SDFG. Unfortunately, this is can cause nasty clashes when:

    • with statement from two different file are at the same line
    • code change outside the DaCe handled code path end up changing the line of a with statement that is considered by DaCe, which invalidates running the .so (bad symbol) when technically nothing changed in the code DaCe should care about.

    To Reproduce

    It would be a two file reproducer with with statement sharing a file number.

    Expected behavior

    The python frontend handling with statement properly is a very good feature and shouldn't be discarded. A more robust sanitization of the with statement should be found.

    Proposal: with util.timer.clock("mainloop") to be sanitized as __with_util_timer_clock_mainloop_X___enter with X a global counter on with statements to keep ordering consistent.

    opened by FlorianDeconinck 0
Releases(v0.14.1)
  • v0.14.1(Oct 14, 2022)

    This release of DaCe offers mostly stability fixes for the Python frontend, transformations, and callbacks.

    Full Changelog: https://github.com/spcl/dace/compare/v0.14...v0.14.1

    Source code(tar.gz)
    Source code(zip)
  • v0.14(Aug 26, 2022)

    What's Changed

    This release brings forth a major change to how SDFGs are simplified in DaCe, using the Simplify pass pipeline. This both improves the performance of DaCe's transformations and introduces new types of simplification, such as dead dataflow elimination.

    Please let us know if there are any regressions with this new release.

    Features

    • Breaking change: The experimental dace.constant type hint has now achieved stable status and was renamed to dace.compiletime
    • Major change: Only modified configuration entries are now stored in ~/.dace.conf. The SDFG build folders still include the full configuration file. Old .dace.conf files are detected and migrated automatically.
    • Detailed, multi-platform performance counters are now available via native LIKWID instrumentation (by @lukastruemper in https://github.com/spcl/dace/pull/1063). To use, set .instrument to dace.InstrumentationType.LIKWID_Counters
    • GPU Memory Pools are now supported through CUDA's mallocAsync API. To enable, set desc.pool = True on any GPU data descriptor.
    • Map schedule and array storage types can now be annotated directly in Python code (by @orausch in https://github.com/spcl/dace/pull/1088). For example:
    import dace
    from dace.dtypes import StorageType, ScheduleType
    
    N = dace.symbol('N')
    
    @dace
    def add_on_gpu(a: dace.float64[N] @ StorageType.GPU_Global,
                   b: dace.float64[N] @ StorageType.GPU_Global):
      # This map will become a GPU kernel
      for i in dace.map[0:N] @ ScheduleType.GPU_Device:
        b[i] = a[i] + 1.0
    
    • Customizing GPU block dimension and OpenMP threading properties per map is now supported
    • Optional arrays (i.e., arrays that can be None) can now be annotated in the code. The simplification pipeline also infers non-optional arrays from their use and can optimize code by eliminating branches. For example:
    @dace
    def optional(maybe: Optional[dace.float64[20]], always: dace.float64[20]):
      always += 1  # "always" is always used, so it will not be optional
      if maybe is None:  # This condition will stay in the code
        return 1
      if always is None:  # This condition will be eliminated in simplify
        return 2
      return 3
    

    Minor changes

    • Miscellaneous fixes to transformations and passes
    • Fixes for string literal ("string") use in the Python frontend
    • einsum is now a library node
    • If CMake is already installed, it is now detected and will not be installed through pip
    • Add kernel detection flag by @TizianoDeMatteis in https://github.com/spcl/dace/pull/1061
    • Better support for __array_interface__ objects by @gronerl in https://github.com/spcl/dace/pull/1071
    • Replacements look up base classes by @tbennun in https://github.com/spcl/dace/pull/1080

    Full Changelog: https://github.com/spcl/dace/compare/v0.13.3...v0.14

    Source code(tar.gz)
    Source code(zip)
  • v0.13.3(Jun 30, 2022)

    What's Changed

    • Better integration with Visual Studio Code: Calling sdfg.view() inside a VSCode console or debug session will open the file directly in the editor!
    • Code generator for the Snitch RISC-V architecture (by @noah95 and @am-ivanov)
    • Minor hotfixes to Python frontend, transformations, and code generation (with @orausch)

    Full Changelog: https://github.com/spcl/dace/compare/v0.13.2...v0.13.3

    Source code(tar.gz)
    Source code(zip)
  • v0.13.2(Jun 22, 2022)

    What's Changed

    • New API for SDFG manipulation: Passes and Pipelines. More about that in the next major release!
    • Various fixes to frontend, type inference, and code generation.
    • Support for more numpy and Python functions: arange, round, etc.
    • Better callback support:
      • Support callbacks with keyword arguments
      • Support literal lists, tuples, sets, and dictionaries in callbacks
    • New transformations: move loop into map, on-the-fly-recomputation map fusion
    • Performance improvements to frontend
    • Better Docker container compatibility via fixes for config files without a home directory
    • Add interface to check whether in a DaCe parsing context in https://github.com/spcl/dace/pull/998
    def potentially_parsed_by_dace():
        if not dace.in_program():
            print('Called by Python interpreter!')
       else:
           print('Compiled with DaCe!')
    
    • Support compressed (gzipped) SDFGs. Loads normally, saves with:
    sdfg.save('myprogram.sdfgz', compress=True)  # or just run gzip on your old SDFGs
    
    • SDFV: Add web serving capability by @orausch in https://github.com/spcl/dace/pull/1013. Use for interactively debugging SDFGs on remote nodes with: sdfg.view(8080) (or any other port)

    Full Changelog: https://github.com/spcl/dace/compare/v0.13.1...v0.13.2

    Source code(tar.gz)
    Source code(zip)
  • v0.13.1(Apr 26, 2022)

    What's Changed

    • Python frontend: Bug fixes for closures and callbacks in nested scopes
    • Bug fixes for several transformations (StateFusion, RedundantSecondArray)
    • Fixes for issues with FORTRAN ordering of numpy arrays
    • Python object duplicate reference checks in SDFG validation

    Full Changelog: https://github.com/spcl/dace/compare/v0.13...v0.13.1

    Source code(tar.gz)
    Source code(zip)
  • v0.13(Feb 28, 2022)

    New Features

    Cutout:

    Cutout allows developers to take large DaCe programs and cut out subgraphs reliably to create a runnable sub-program. This sub-program can be then used to check for correctness, benchmark, and transform a part of a program without having to run the full application. * Example usage from Python:

    def my_method(sdfg: dace.SDFG, state: dace.SDFGState):
        nodes = [n for n in state if isinstance(n, dace.nodes.LibraryNode)]  # Cut every library node
        cut_sdfg: dace.SDFG = cutout.cutout_state(state, *nodes)
        # The cut SDFG now includes each library node and all the necessary arrays to call it with
    

    Also available in the SDFG editor:

    Data Instrumentation:

    Just like node instrumentation for performance analysis, data instrumentation allows users to set access nodes to be saved to an instrumented data report, and loaded later for exact reproducible runs. * Data instrumentation natively works with CPU and GPU global memory, so there is no need to copy data back * Combined with Cutout, this is a powerful interface to perform local optimizations in large applications with ease! * Example use:

        @dace.program
        def tester(A: dace.float64[20, 20]):
            tmp = A + 1
            return tmp + 5
    
        sdfg = tester.to_sdfg()
        for node, _ in sdfg.all_nodes_recursive():  # Instrument every access node
            if isinstance(node, nodes.AccessNode):
                node.instrument = dace.DataInstrumentationType.Save
    
        A = np.random.rand(20, 20)
        result = sdfg(A)
    
        # Get instrumented data from report
        dreport = sdfg.get_instrumented_data()
        assert np.allclose(dreport['A'], A)
        assert np.allclose(dreport['tmp'], A + 1)
        assert np.allclose(dreport['__return'], A + 6)
    

    Logical Groups:

    SDFG elements can now be grouped by any criteria, and they will be colored during visualization by default (by @phschaad). See example in action:

    Changes and Bug Fixes

    • Samples and tutorials have now been updated to reflect the latest API
    • Constants (added with sdfg.add_constant) can now be used as access nodes in SDFGs. The constants are hard-coded into the generated program, so you can run code with the best performance possible.
    • View nodes can now use the views connector to disambiguate which access node is being viewed
    • Python frontend: else clause is now handled in for and while loops
    • Scalars have been removed from the __dace_init generated function signature (by @orausch)
    • Multiple clock signals in the RTL codegen (by @carljohnsen)
    • Various fixes to frontends, transformations, and code generators

    Full Changelog available at https://github.com/spcl/dace/compare/v0.12...v0.13

    Source code(tar.gz)
    Source code(zip)
  • v0.12(Jan 22, 2022)

    API Changes

    Important: Pattern-matching transformation API has been significantly simplified. Transformations using the old API must be ported! Summary of changes:

    • Transformations now expand either the SingleStateTransformation or MultiStateTransformation classes instead of using decorators
    • Patterns must be registered as class variables called PatternNodes
    • Nodes in matched patterns can be then accessed in can_be_applied and apply directly using self.nodename
    • The name strict is now replaced with permissive (False by default). Permissive mode allows transformations to match in more cases, but may be dangerous to apply (e.g., create race conditions).
    • can_be_applied is now a method of the transformation
    • The apply method accepts a graph and the SDFG.

    Example of using the new API:

    import dace
    from dace import nodes
    from dace.sdfg import utils as sdutil
    from dace.transformation import transformation as xf
    
    class ExampleTransformation(xf.SingleStateTransformation):
        # Define pattern nodes
        map_entry = xf.PatternNode(nodes.MapEntry)
        access = xf.PatternNode(nodes.AccessNode)
    
        # Define matching subgraphs
        @classmethod
        def expressions(cls):
            # MapEntry -> Access
            return [sdutil.node_path_graph(cls.map_entry, cls.access)]
    
        def can_be_applied(self, graph: dace.SDFGState, expr_index: int, sdfg: dace.SDFG, permissive: bool = False) -> bool:
            # Returns True if the transformation can be applied on a subgraph
            if permissive:  # In permissive mode, we will always apply this transformation
                return True
            return self.map_entry.schedule == dace.ScheduleType.CPU_Multicore
    
        def apply(self, graph: dace.SDFGState, sdfg: dace.SDFG):
            # Apply the transformation using the SDFG API
            pass
    

    Simplifying SDFGs is renamed from sdfg.apply_strict_transformations() to sdfg.simplify()

    AccessNodes no longer have an AccessType field.

    Other changes

    • More nested SDFG inlining opportunities by default with the multi-state inline transformation
    • Performance optimizations of the DaCe framework (parsing, transformations, code generation) for large graphs
    • Support for Xilinx Vitis 2021.2
    • Minor fixes to transformations and deserialization

    Full Changelog: https://github.com/spcl/dace/compare/v0.11.4...v0.12

    Source code(tar.gz)
    Source code(zip)
  • v0.11.4(Dec 17, 2021)

    What's Changed

    • If a Python call cannot be parsed into a data-centric program, DaCe will automatically generate a callback into Python. Supports CPU arrays and GPU arrays (via CuPy) without copying!
    • Python 3.10 support
    • CuPy arrays are supported when calling @dace.programs in JIT mode
    • Fix various issues in Python frontend and code generation

    Full Changelog: https://github.com/spcl/dace/compare/v0.11.3...v0.11.4

    Source code(tar.gz)
    Source code(zip)
  • v0.11.3(Nov 23, 2021)

  • v0.11.2(Nov 12, 2021)

  • v0.11.1(Oct 18, 2021)

    What's Changed

    • More flexible Python frontend: you can now call functions and object methods, use fields and globals in @dace programs! Some examples:
      • There is no need to annotate called functions
      • @dataclass and general object field support
      • Loop unrolling: implicit and explicit (with the dace.unroll generator)
      • Constant folding and explicit constant arguments (with dace.constant as a type hint)
      • Debuggability: all functions (e.g. dace.map, dace.tasklet) work in pure Python as well
      • and many more features
    • NumPy semantics are followed more closely, e.g., subscripts create array views
    • Direct CuPy and torch.tensor integration in @dace program arguments
    • Auto-optimization (preview): use @dace.program(auto_optimize=True, device=dace.DeviceType.CPU) to automatically run some transformations, such as turning loops into parallel maps.
    • ARM SVE code generation support by @sscholbe (#705)
    • Support for MLIR tasklets by @Berke-Ates in (#747)
    • Source Mapping by @benibenj in https://github.com/spcl/dace/pull/756
    • Support for HBM on Xilinx FPGAs by @jnice-81 (#762)

    Miscellaneous:

    • Various performance optimizations to calling @dace programs
    • Various bug fixes to transformations, code generator, and frontends

    Full Changelog: https://github.com/spcl/dace/compare/v0.10.8...v0.11.1

    Source code(tar.gz)
    Source code(zip)
  • v0.10.8(Apr 14, 2021)

    What's New?

    • Various bug fixes and more stable Python/NumPy frontend
    • Support for running DaCe programs within the Python interpreter
    • (experimental) Support for automatic optimization passes (more coming soon!)
    Source code(tar.gz)
    Source code(zip)
  • v0.10.0(Oct 4, 2020)

    What's New?

    • Python frontend improvements: More Python features are supported, such as return values, tuples, and numpy broadcasting. @dace.programs can now call other programs or SDFGs.
    • AMD GPU (HIP) Support: AMD GPUs are now fully supported with HIP code generation.
    • Easy-to-use transformation APIs: Apply transformation compositions with one call, enumerate subgraph matches manually, and many more functions now available as part of the dace API. See the new tutorial for examples.
    • Faster code generation: Backends now generate lower-level code that is more compiler-friendly.
    • Instrumentation interface: Setting the instrument property for SDFG nodes and states enables easy-to-use, localized performance reporting with timers, GPU events, and PAPI performance counters.
    • DaCe VSCode plugin: Interactive SDFG viewer and optimizer as part of Visual Studio Code. Download the plugin here.
    • Type inference and connector types: In addition to automatic type inference, connectors on nodes can now be defined with explicit types, giving more fine-grained control over type reinterpreting and vector types.
    • Subgraph transformations: New transformation type that can work on arbitrary subgraphs. For example, fuse any computation within a state with SubgraphFusion.
    • Persistent GPU kernel schedule: Launch persistent kernels with a change of a property! Proportion used of GPU multiprocessors is configurable.
    • More transformations: Loop manipulation and other new transformations now available with DaCe. Some transformations (such as Vectorization) made more robust to corner cases.
    • More tools: Use sdfgcc to quickly compile and optimize .sdfg files from the command line, generating header and library files. Great for interoperability and Makefiles.
    • Short DaCe annotation: Data-centric functions can now be annotated with @dace.
    • Many minor fixes and additions: More library nodes (such as einsum) and new properties added, enabling faster performance and more productive high-performance coding than ever.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.5(Jan 6, 2020)

    What's New?

    • Intel FPGA backend: Generates and compiles Intel FPGA OpenCL code from SDFGs.
    • Renderer: Many improvements to the scalability of drawing large SDFGs, touch/mobile support, and code view upon zooming into Tasklets.
    • SDFV: Now includes a sidebar with information about clicked nodes/edges/states.
    • GPU reduction: Now supports Reduce nodes where output array contains multiple dimensions (if contiguous). On other cases, use the ReduceExpansion transformation.
    • Faster compilation: Improved CMake usage to speed up compilation time if files were not changed.
    • Stability: Various fixes to the Python frontend, transformations, code generation, and DIODE (on Linux and Windows).
    • Generated programs now include header (.h) file and an example C program that invokes the compiled SDFG.
    Source code(tar.gz)
    Source code(zip)
  • v0.9.0(Oct 22, 2019)

    What's New

    • NumPy syntax for Python: Wrap Python functions that work on numpy arrays with @dace.program and create SDFGs from implicit dataflow.
    • DIODE 2.0: DIODE has been reworked to operate in the browser, and works natively on Windows. Note that it is currently experimental, and some features may cause errors. We are happy to fix bugs if you find and report issues!
    • Standalone SDFG renderer (SDFV) and improved Jupyter support: Contextual, optimized SDFG drawing with collapsible scopes (double-click a map, a state, or a nested SDFG). Fully integrated into Jupyter notebooks.
    • Transformations: Improvements to scalability of subgraph pattern matching and memlet propagation.
    • Improvements to the TensorFlow frontend.
    • Many minor bug fixes and several API improvements.
    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Aug 24, 2019)

A fast, flexible, and performant feature selection package for python.

linselect A fast, flexible, and performant feature selection package for python. Package in a nutshell It's built on stepwise linear regression When p

88 Dec 06, 2022
ASTR 302: Python for Astronomy (Winter '22)

ASTR 302, Winter 2022, University of Washington: Python for Astronomy Mario Jurić Location When: 2:30-3:50, Monday & Wednesday, Winter quarter 2022 Wh

UW ASTR 302: Python for Astronomy 4 Jan 12, 2022
This mini project showcase how to build and debug Apache Spark application using Python

Spark app can't be debugged using normal procedure. This mini project showcase how to build and debug Apache Spark application using Python programming language. There are also options to run Spark a

Denny Imanuel 1 Dec 29, 2021
Automated Exploration Data Analysis on a financial dataset

Automated EDA on financial dataset Just a simple way to get automated Exploration Data Analysis from financial dataset (OHLCV) using Streamlit and ta.

Darío López Padial 28 Nov 27, 2022
Employee Turnover Analysis

Employee Turnover Analysis Submission to the DataCamp competition "Can you help reduce employee turnover?"

Jannik Wiedenhaupt 1 Feb 13, 2022
Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

Henrique A. Lourenço 1 Apr 12, 2022
Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Fast Laplacian Eigenmaps in python Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMS

17 Jul 09, 2022
NumPy aware dynamic Python compiler using LLVM

Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco

Numba 8.2k Jan 07, 2023
Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

PyUpBit CS490 Large Scale Data Analytics — Implementation of Updatable Compressed Bitmap Indexing Paper Table of Contents About The Project Usage Cont

Hyeong Kyun (Daniel) Park 1 Jun 28, 2022
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 05, 2023
Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day.

Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day. Correlate the market activity with the Apple Keynote presentations.

2 Jan 04, 2022
Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

Felipe Demenech Vasconcelos 2 Jan 20, 2022
statDistros is a Python library for dealing with various statistical distributions

StatisticalDistributions statDistros statDistros is a Python library for dealing with various statistical distributions. Now it provides various stati

1 Oct 03, 2021
A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

Rishikesh S 4 Oct 17, 2022
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

Codes for the collection and predictive processing of bitcoin from the API of coinmarketcap

Teo Calvo 5 Apr 26, 2022
A Python adaption of Augur to prioritize cell types in perturbation analysis.

A Python adaption of Augur to prioritize cell types in perturbation analysis.

Theis Lab 2 Mar 29, 2022
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 08, 2022
4CAT: Capture and Analysis Toolkit

4CAT: Capture and Analysis Toolkit 4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to m

Digital Methods Initiative 147 Dec 20, 2022
A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022