Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files

Overview

asammdf is a fast parser and editor for ASAM (Association for Standardization of Automation and Measuring Systems) MDF (Measurement Data Format) files.

asammdf supports MDF versions 2 (.dat), 3 (.mdf) and 4 (.mf4).

asammdf works on Python >= 3.7 (for Python 2.7, 3.4 and 3.5 see the 4.x.y releases)

Status

Continuous Integration Coveralls Codacy ReadTheDocs
continuous integration Coverage Status Codacy Badge Documentation Status
PyPI conda-forge
PyPI version conda-forge version

Project goals

The main goals for this library are:

  • to be faster than the other Python based mdf libraries
  • to have clean and easy to understand code base
  • to have minimal 3-rd party dependencies

Features

  • create new mdf files from scratch

  • append new channels

  • read unsorted MDF v3 and v4 files

  • read CAN and LIN bus logging files

  • extract CAN and LIN signals from anonymous bus logging measurements

  • filter a subset of channels from original mdf file

  • cut measurement to specified time interval

  • convert to different mdf version

  • export to HDF5, Matlab (v4, v5 and v7.3), CSV and parquet

  • merge multiple files sharing the same internal structure

  • read and save mdf version 4.10 files containing zipped data blocks

  • space optimizations for saved files (no duplicated blocks)

  • split large data blocks (configurable size) for mdf version 4

  • full support (read, append, save) for the following map types (multidimensional array channels):

    • mdf version 3 channels with CDBLOCK

    • mdf version 4 structure channel composition

    • mdf version 4 channel arrays with CNTemplate storage and one of the array types:

      • 0 - array
      • 1 - scaling axis
      • 2 - look-up
  • add and extract attachments for mdf version 4

  • handle large files (for example merging two fileas, each with 14000 channels and 5GB size, on a RaspberryPi)

  • extract channel data, master channel and extra channel information as Signal objects for unified operations with v3 and v4 files

  • time domain operation using the Signal class

    • Pandas data frames are good if all the channels have the same time based
    • a measurement will usually have channels from different sources at different rates
    • the Signal class facilitates operations with such channels
  • graphical interface to visualize channels and perform operations with the files

Major features not implemented (yet)

  • for version 3

    • functionality related to sample reduction block: the samples reduction blocks are simply ignored
  • for version 4

    • experimental support for MDF v4.20 column oriented storage
    • functionality related to sample reduction block: the samples reduction blocks are simply ignored
    • handling of channel hierarchy: channel hierarchy is ignored
    • full handling of bus logging measurements: currently only CAN and LIN bus logging are implemented with the ability to get signals defined in the attached CAN/LIN database (.arxml or .dbc). Signals can also be extracted from an anonymous bus logging measurement by providing a CAN or LIN database (.dbc or .arxml)
    • handling of unfinished measurements (mdf 4): finalization is attempted when the file is loaded, however the not all the finalization steps are supported
    • full support for remaining mdf 4 channel arrays types
    • xml schema for MDBLOCK: most metadata stored in the comment blocks will not be available
    • full handling of event blocks: events are transferred to the new files (in case of calling methods that return new MDF objects) but no new events can be created
    • channels with default X axis: the default X axis is ignored and the channel group's master channel is used
    • attachment encryption/decryption using user provided encryption/decryption functions; this is not part of the MDF v4 spec and is only supported by this library

Usage

from asammdf import MDF

mdf = MDF('sample.mdf')
speed = mdf.get('WheelSpeed')
speed.plot()

important_signals = ['WheelSpeed', 'VehicleSpeed', 'VehicleAcceleration']
# get short measurement with a subset of channels from 10s to 12s
short = mdf.filter(important_signals).cut(start=10, stop=12)

# convert to version 4.10 and save to disk
short.convert('4.10').save('important signals.mf4')

# plot some channels from a huge file
efficient = MDF('huge.mf4')
for signal in efficient.select(['Sensor1', 'Voltage3']):
   signal.plot()

Check the examples folder for extended usage demo, or the documentation http://asammdf.readthedocs.io/en/master/examples.html

https://canlogger.csselectronics.com/canedge-getting-started/log-file-tools/asammdf-api/

Documentation

http://asammdf.readthedocs.io/en/master

And a nicely written tutorial on the CSS Electronics site

Contributing & Support

Please have a look over the contributing guidelines

If you enjoy this library please consider making a donation to the numpy project or to danielhrisca using liberapay Donate using Liberapay

Contributors

Thanks to all who contributed with commits to asammdf:

Installation

asammdf is available on

pip install asammdf
# for the GUI 
pip install asammdf[gui]
# or for anaconda
conda install -c conda-forge asammdf

In case a wheel is not present for you OS/Python versions and you lack the proper compiler setup to compile the c-extension code, then you can simply copy-paste the package code to your site-packages. In this way the python fallback code will be used instead of the compiled c-extension code.

Dependencies

asammdf uses the following libraries

  • numpy : the heart that makes all tick
  • numexpr : for algebraic and rational channel conversions
  • wheel : for installation in virtual environments
  • pandas : for DataFrame export
  • canmatrix : to handle CAN/LIN bus logging measurements
  • natsort
  • lxml : for canmatrix arxml support
  • lz4 : to speed up the disk IO performance

optional dependencies needed for exports

  • h5py : for HDF5 export
  • scipy : for Matlab v4 and v5 .mat export
  • hdf5storage : for Matlab v7.3 .mat export
  • fastparquet : for parquet export

other optional dependencies

  • PyQt5 : for GUI tool
  • pyqtgraph : for GUI tool and Signal plotting
  • matplotlib : as fallback for Signal plotting
  • cChardet : to detect non-standard Unicode encodings
  • chardet : to detect non-standard Unicode encodings
  • pyqtlet : for GPS window
  • isal : for faster zlib compression/decompression

Benchmarks

http://asammdf.readthedocs.io/en/master/benchmarks.html

Comments
  • Fix Mat file Export

    Fix Mat file Export

    Python version

    Please run the following snippet and write the output here

    python=2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)]'
    'os=Windows-7-6.1.7601-SP1'
    

    Code

     for file_name in os.listdir(dir_mdf_file):
            mdf_parser = asammdf.mdf.MDF(name=file_name, memory='low', version='4.00')
            mdf_file = mdf_parser.export('mat',file_name)
    
    

    Traceback

    (<type 'exceptions.TypeError'>, TypeError("'generator' object has no attribute '__getitem__'",), <traceback object at 0x000000000E247A88>)
    

    Error 2: (<type 'exceptions.ValueError'>, ValueError('array-shape mismatch in array 7',), <traceback object at 0x0000000014D9B0C8>)

    Description

    While trying to convert MDF files to .mat files using MDF module, I am getting these two errors. I am assuming that Error 2 may be because of NUMPY, can anyone help why this error comes while exporting? Can i get help on this??

    bug 
    opened by bhagivinni 122
  • Converting large files

    Converting large files

    When converting large files to parquet or a Pandas Dataframe, I Always get a Memory error. I was wondering, if it is possible to have some kind of low Memory mode or even better Streaming mode.

    Essentially I want parquet files, but I saw that asammdf converts the .dat, mf4 etc... files to a Pandas DataFrame under the Hood anyway and uses the result to export to Parquet.

    So I was playing around with the Code trying to cast the columns to more appropriate dtypes.

    def _downcast(self, src_series):
        if np.issubdtype(src_series.dtype, np.unsignedinteger):
            res_series = pd.to_numeric(src_series, downcast='unsigned')
        elif np.issubdtype(src_series.dtype, np.signedinteger):
            if src_series.min() < 0:
                res_series = pd.to_numeric(src_series, downcast='signed')
            else:
                res_series = pd.to_numeric(src_series, downcast='unsigned')
        elif np.issubdtype(src_series.dtype, np.floating):
            res_series = pd.to_numeric(src_series, downcast='float')
        else:
            res_series = src_series.astype('category')
    
        return res_series
    

    It saves some memory, but unfortunately this is not enough. I do have some files that are 5Gb or larger. When converted to a DataFrame they get inflated to beyond 20Gbs.

    Any help is appreciated.

    enhancement 
    opened by Nimi42 44
  • 'right_shift' not supported

    'right_shift' not supported

    Hello I met a issue when load MDF file, it seems comes from numpy, and this issue is not exist in version 2.5.3, bellowing is the error information

    site-packages\asammdf\mdf4.py", line 2912, in get vals = vals >> bit_offset

    ufunc 'right_shift' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

    at this time vals is a ndarray which with a list

    opened by lznzhou33zzz 34
  • Reading Channel Causes Crash

    Reading Channel Causes Crash

    Python version

    ('python=3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit ' '(AMD64)]') 'os=Windows-10-10.0.19043-SP0' 'numpy=1.22.3' ldf is not supported xls is not supported xlsx is not supported yaml is not supported 'asammdf=7.0.7'

    Code

    MDF version

    4.10

    Code snippet

        data = MDF(mf4_file)
        sig = data.get('MIPI_CSI2_Frame.DataBytes')  # type: Signal Error occurs here
        real_sam = sig.samples  # type: ndarray
        offset = real_sam[0].size - 25
    

    OR

        it_data = data.iter_get(name='MIPI_CSI2_Frame.DataBytes', samples_only=True, raw=True)
        next(it_data) # error occurs here
    

    Traceback

    Traceback (most recent call last):
      File "C:\Users\xxxx\PycharmProjects\MF4Convert\venv\lib\site-packages\asammdf\blocks\mdf_v4.py", line 7442, in _get_scalar
        vals = extract(signal_data, 1, vals - vals[0])
    SystemError: <built-in function __import__> returned a result with an error set
    
    Process finished with exit code -1073741819 (0xC0000005) 
    

    Description

    File is very large ~3.5GB, trying to read data from the mentioned channel using either the iteration method or Signal methods causes an error. A similar file ~1.7GB in size reads fine. Both load into the asammdf tool just fine, including the file causing the crash, however the file causing the crash causes the same error I have above whenever i try to export data with the GUI.

    Reading a different channel in this file does not result in the error.

    The machine im using has 32GB of RAM and plenty of storage space.

    opened by lkuza2 31
  • In MDF files samples and timestamps length mismatch

    In MDF files samples and timestamps length mismatch

    I am using this tool for analyzing MDF files, I am getting this error

    asammdf.utils.MdfException: <SIGNAL_NAME> samples and timestamps length mismatch (58 vs 67)

    I have used ETAS tools to check the signal but there it seemed everything is fine, what is going wrong here?

    Fixed - waiting for next release 
    opened by AkbarAlam 31
  • multiple actions for file handling

    multiple actions for file handling

    Hello, too much actions are needed to handle files.

    Example : I've done acquisition on vehicule and want to do a post-processing with Matlab on a part of the file. currently, I have first to cut to keep a temporal parts and create a new file. then I've to filter in another file to keep only a short list of channels. and finally I've to export in a last .mat file. Same thing if we need to transmit datas to a supplier who works with other tools.

    It could be easier and faster to merge the 'filter','export', 'convert', 'cut' and 'resample' functions. It leads to directly choose in a single tab which channels we would like to handle and what kind of action has to be applied on it.

    1. choose channels idea is to have a channel selection list with 'clear', 'load' & 'save' function in order to select easily It could be great also to add beside a selected channel list as you do for 'search' function in order to have a easy look of what is selected. (in order to be fast, no specific selected channel could mean that the whole file as to be used)

    2. choose action(s) and output format and at rigth, we could find all temporal or sampling management options and all Output formats.

    By this way, handling files could be easy and fast.

    I tried to summarize in an example: image

    Your opinion ?

    opened by gaetanrocq 30
  • Bug in reading .dat file

    Bug in reading .dat file

    Python version: 3.6.1

    from asammdf import MDF
    
    def merge_mdfs(files):
        return MDF.merge(files).resample(0.1)
    
    def main():
        # Here I have code to read in a series of '.dat' files as a list into the variable 'files'
        merge_file = merge_mdfs(files)
    
        # The variable 'merge_file' contains channel names with the corrupt data
    

    Code

    MDF version

    4.7.8

    Code snippet

    return MDF.merge(files).resample(0.1)

    Traceback

    This code doesn't produce an error or a traceback

    Description

    Let me preface by saying that I am relatively new to python and even newer to this library, but I understand the importance of collaboration so I will try my best to help. Since I work for a big company, I can't share exactly everything I am working on, especially the data files I am using. However, I will do my best to supply all available information I can to resolve this issue.

    The issue I'm having is that when I read in a series of '.dat' files, sometimes the data gets read in perfectly, but other times the data gets all messed up and values that were not in the original data find their way in.

    Example: I am reading in acceleration data from an accelerometer. The min and max values of this data trace are confirmed by one of the other tools my company uses to plot data to be max: ~6.5, min: ~-1.5 (units = m/s^2). When I read a series of these same files in I get a max of the same value and a min of ~-13 m/s^2. When I go in and look at the data there are more data points than their should be, and the data doesn't flow like what I would expect to see (i.e. a lot of repeating values).

    Please let me know if anyone needs more information to help solve this issue. I will try my best to supply any additional information requested.

    Thanks for supporting this awesome library! :)

    bug 
    opened by drhannah94 30
  • MDF4 get_group() breaks data

    MDF4 get_group() breaks data

    Python version

    ('python=3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit '
     '(Intel)]')
    'os=Windows-10-10.0.18362-SP0'
    'numpy=1.16.1'
    'asammdf=5.10.4.dev0'
    

    Code

    MDF version

    4.00

    Code snippet

    from asammdf import MDF
    
    if __name__ == "__main__":
       mdf = MDF("path_to_file.MF4")
       df = mdf.get_group(5)
    

    Traceback

    Traceback (most recent call last):
    File "C:/myPath/main.py", line 5, in <module>
      df = mdf.get_group(5)
    File "myPath\venv\lib\site-packages\asammdf\mdf.py", line 3352, in get_group
      ignore_value2text_conversions=ignore_value2text_conversions,
    File "myPath\venv\lib\site-packages\asammdf\mdf.py", line 3423, in to_dataframe
      mdf = self.filter(channels)
    File "myPath\venv\lib\site-packages\asammdf\mdf.py", line 1699, in filter
      copy_master=False,
    File "myPath\venv\lib\site-packages\asammdf\blocks\mdf_v4.py", line 3954, in get
      _dtype = dtype(channel.dtype_fmt)
    ValueError: field '_level_1_structure._level_2_structure' occurs more than once
    

    so i added the following code to the following code to MDF4 -> _read_channels() -> while ch_addr https://github.com/danielhrisca/asammdf/blob/88e2e67a18a77c4ee437907a3f67603397b6eac0/asammdf/blocks/mdf_v4.py#L798

                if channel.component_addr:
                    channel.name += "_struct"
    

    I mentioned this in my email to you. This fixes the duplicate channel names. Now my Traceback is the following:

    C:\myPath\venv\lib\site-packages\pandas\core\indexes\numeric.py:443: RuntimeWarning: invalid value encountered in equal
      return ((left == right) | (self._isnan & other._isnan)).all()
    

    I was looking for the reason for this. It looks like you create a new filtered mdf file here, when using get_group() https://github.com/danielhrisca/asammdf/blob/88e2e67a18a77c4ee437907a3f67603397b6eac0/asammdf/mdf.py#L3423

    This new mdf has a fragmented data block. The first data_block fragment has a different total size and also a different samples_size than the following fragments. image

    When i check the returned pandas dataframe the data is correct for the first ~6500 rows, then garbage follows. If i use the get method to retrieve a single Signal from the original MDF object the data is okay.

    The file is rather large (160MB). Please tell me if you need it to find the cause.

    opened by zariiii9003 28
  • Slow reading from External Hard Driver

    Slow reading from External Hard Driver

    I am working on a data analysis of multiple files stored in an external hard drive. Files are large (1-7GB). I was using mdfreader before and have recently migrated to asammdf, and I am finding a more similar behaviour when working with the local hard drive files, but a huge difference when loading from the external drive.

    For instance: just loading a 5GB file, and getting the samples for one signal.

    asammdf: 329.76s mdfreader (with no_data_loading = True): 6.42s

    However, running with the file in the local hard drive:

    asammdf: 14.22s mdfreader: 3.79s

    Is it there any way to improve this difference?

    opened by sirlagg 27
  • Loading Data is slow since asammdf 5.x.x

    Loading Data is slow since asammdf 5.x.x

    Hi Daniel,

    since I updated asammdf to 5.x.x (used 5.0.3, 5.0.4 and 5.1.0dev so far), loading data is dramatically slower than before!

    My general procedure for loading data into a pandas dataframe is:

    • loading the file
    • fetching (asammdf.get()) the samples, unit, comment and conversion of all channels using group and index information from channels_db
    • interpolation to the same raster (0.1s)
    • creating a pandas dataframe for the samples, units, comment

    For a mdf3 file with a size of 1,7Gb this needs 12 minutes to load! Doing the same with mdfreader only needs 45s.

    With asammdf 4.7.11 this only needs 26s. Obviously a huge difference.

    I think this is due to the removed memory option. As mentioned in asammdfs documentation, i can speed up data loading by using configure and tune the read_fragment_size parameter. Is there a way to have the same loading speed as with the old memory='full' option?

    Regards, legout

    opened by legout 27
  • cannot fit 'int' into an index-sized integer

    cannot fit 'int' into an index-sized integer

    Python version

    'python=3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44) \n[GCC 7.3.0]' 'os=Linux-4.20.12-arch1-1-ARCH-x86_64-with-arch' 'numpy=1.16.2' 'asammdf=5.0.2'

    Code

    MDF version

    4.10

    Code snippet

    Any list of valid channels will result in that error.

    with MDF(file) as mdf:
        mdf.filter(channels=["Ams_Mp"])
    

    Traceback

    ---------------------------------------------------------------------------
    OverflowError                             Traceback (most recent call last)
    <ipython-input-63-cb54d2207089> in <module>
          1 with MDF(file) as mdf:
    ----> 2     mdf.filter(channels=["Ams_Mp"])
    
    ~/.conda/envs/datatools/lib/python3.6/site-packages/asammdf/mdf.py in filter(self, channels, version)
       1587                 info=None
       1588                 comment="">
    -> 1589         <Signal SIG:
       1590                 samples=[ 12.  12.  12.  12.  12.]
       1591                 timestamps=[0 1 2 3 4]
    
    ~/.conda/envs/datatools/lib/python3.6/site-packages/asammdf/blocks/mdf_v4.py in _load_data(self, group, record_offset, record_count)
       1230 
       1231         Returns
    -> 1232         -------
       1233         data : bytes
       1234             aggregated raw data
    
    OverflowError: cannot fit 'int' into an index-sized integer
    

    Description

    Hello!

    I recently discovered an issue with the new version of asammdf (5.0.2) in version (4.7.9) it worked fine for me. As soon as I try to filter or select certain channels the aforementioned error will be raised.

    opened by ofesseler 27
  • Add support for loading data partially according to integer or boolean arrays

    Add support for loading data partially according to integer or boolean arrays

    Python version

    Please run the following snippet and write the output here

    ('python=3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:50:36) [MSC '
     'v.1929 64 bit (AMD64)]')
    'os=Windows-10'
    'numpy=1.23.4'
    'asammdf=7.1.0'
    

    Description

    It is currently possible to load partial data using the mdf.get method: https://github.com/danielhrisca/asammdf/blob/fda5d791ac0a78943eb3dcf8899811cdcda34b82/asammdf/blocks/mdf_v4.py#L6427-L6428

    This allows loading a range of data in similar fashion to np.array([1,2,3,4])[record_offset:record_count] which saves some RAM if you only need a certain range.

    It would be nice to extend the function to also support filtering the data using

    • integer arrays, np.array([0, 1, 2, 8, 9], dtype=int).
    • boolean arrays, np.array([0, 1, 0, 1], dtype=bool).

    This would allow quite advanced filtering without having to load all the data to RAM.

    For inspiration h5py supports these and quite a bit of numpy's fancy indexing: https://docs.h5py.org/en/stable/high/dataset.html?highlight=indexing#fancy-indexing

    opened by Illviljan 2
  • Create a MF4 file from CSV and AVI files

    Create a MF4 file from CSV and AVI files

    Python version

    ('python=3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC ' 'v.1916 64 bit (AMD64)]') 'os=Windows-10-10.0.19044-SP0' 'numpy=1.23.5' 'asammdf=7.2.0'

    Code

    MDF version

    4.00

    Code snippet

    #%% Imports
    import pandas as pd
    import logging
    from asammdf import MDF
    from os import listdir
    from os.path import isfile, join
    
    #%% Directories
    csv_directory = "in/csvfile.csv"
    avi_directory = "in/avifile.avi"
    mf4_directory = "out/mf4file.mf4"
    
    #%% Create empty mdf object
    mdf  = MDF()
    
    #%% Append data from csv file 
    df_csv = pd.read_csv(csv_directory )
    mdf.append(df_csv)
    
    #%% Attach video 
    with open(avi_directory , 'rb') as f:
            data = f.read()
            index = mdf.attach(data, file_name='front_camera.avi', compression=False, embedded=False)
            mdf.groups[-1].channels[-1].data_block_addr = index
            mdf.attachments[0].file_name = Path(str(mdf.attachments[0].file_name).replace("FROM_", "", 1))
    
    #%% Save to mf4
    mdf.save(mf4_directory)
    

    Description

    Hi all!

    I have some vehicle data from CAN Bus written in csv and a synchronized video with matching number of frames. I would like to display both the signals and the associated video frames in a tool, which requires mf4 as input. Therefore, I have setup the code above to convert csv files and avi files into mf4 format by recycling a similar issue #316. The code does not trigger an error and appends the csv flawlessly to the mdf object, but it doesn't work for the avi attachment (neither size nor channel of mf4 output changes - the mf4 only contains signals from csv input). In debug mode I can see that the avi is successfully read to a binary file (data = b'RIFFX\xceM\x01AVI LIST\xec\x11\x00\x00hdrlavih8\x00\x00\x005\x82 ... ).

    I'm quite new to this topic and am struggling to find documentation and threads to this kind of task. Any help is highly appreciated!

    opened by LinusUng 1
  • No .mat file created using export function, no errors aswell

    No .mat file created using export function, no errors aswell

    Python version

    Please run the following snippet and write the output here

    ('python=3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 ' 'bit (AMD64)]') 'os=Windows-10-10.0.19045-SP0' 'numpy=1.23.1' ldf is not supported xls is not supported xlsx is not supported yaml is not supported 'asammdf=7.1.1'

    Code

    from asammdf import MDF from pathlib import Path

    import tkinter as tk from tkinter import filedialog

    mdf_extension = ".MF4" input_folder = "input" output_folder = "output"

    path = Path(file).parent.absolute()

    root = tk.Tk() root.withdraw()

    mf4_path = tuple(input("MF4_File_Path_Apaix").split())

    root = tk.Tk() root.withdraw()

    dbc_path = filedialog.askopenfilenames(parent=root, title='Choose DBC file(s)', filetypes=[("dbc files", ".dbc")])

    if not dbc_path: exit()

    logfiles = list(mf4_path) dbc_files = {"CAN": [(dbc, 0) for dbc in dbc_path]}

    mdf = MDF.concatenate(logfiles)

    mdf_scaled = mdf.extract_bus_logging(dbc_files) mdf_scaled.export(fmt='mat', filename=Path('C:\Temp\APAIX\MDF4\matlab.mat'))

    MDF version

    4.11

    Code snippet

    no error

    Traceback

    no traceback

    Description

    Hi, I am trying to convert CAN data in MF4 from a CSS logger to a matfile. Currently it works fine in CSV but the files are too heavy and it will be easier to load it to our data analysis SW in .mat. Unfortunately, the export function doesn't output any mat file. I made sure to have scipy installed and checked the other issues on gitHub.

    opened by FlorentTKS 6
  • Feature request: Use enums instead of constant definitions

    Feature request: Use enums instead of constant definitions

    I wanted to ask if you thought about re-organizing the code to use Enums instead of constant integers. For example: Instead of

    CONVERSION_TYPE_NON = 0
    CONVERSION_TYPE_LIN = 1
    CONVERSION_TYPE_RAT = 2
    CONVERSION_TYPE_ALG = 3
    CONVERSION_TYPE_TABI = 4
    CONVERSION_TYPE_TAB = 5
    CONVERSION_TYPE_RTAB = 6
    CONVERSION_TYPE_TABX = 7
    CONVERSION_TYPE_RTABX = 8
    CONVERSION_TYPE_TTAB = 9
    CONVERSION_TYPE_TRANS = 10
    CONVERSION_TYPE_BITFIELD = 11
    
    CONVERSION_TYPE_TO_STRING = dict(
        enumerate(
            [
                "NON",
                "LIN",
                "RAT",
                "ALG",
                "TABI",
                "TAB",
                "RTAB",
                "TABX",
                "RTABX",
                "TTAB",
                "TRANS",
                "BITFIELD",
            ]
        )
    )
    

    Maybe something like:

    class ConversionType(Enum):
        NON = 0
        LIN = 1
        RAT = 2
        ALG = 3
        TABI = 4
        TAB = 5
        RTAB = 6
        TABX = 7
        RTABX = 8
        TTAB = 9
        TRANS = 10
        BITFIELD = 11
    
    
    CONVERSION_TYPE_TO_STRING = dict(
        enumerate(e.name for e in ConversionType)
    )
    

    I think this would have the benefit of the user seeing the "pretty" conversion type (like TABX, LIN) instead of a number that they maybe do not know what it means. It should also help with checking and comparing the values to the known enum values.

    The same could be done with a lot of the constants in the v4_constants.py file (and v2 and v3 as well probably)

    opened by eblis 2
  • Support for loading files from Azure/ any other fsspec compatible file stores

    Support for loading files from Azure/ any other fsspec compatible file stores

    Python version

    ('python=3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC ' '10.4.0]') 'os=Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.35' 'numpy=1.23.4'

    Code

    MDF version

    3.20

    Code snippet 1

       import adlfs
       fs = adlfs.AzureBlobFileSystem(account_name="account_name", sas_token="sas_token")
       MDF(fs)
    

    Traceback.

    TypeError                                 Traceback (most recent call last)
    Cell In [9], line 2
          1 file = fs.open('test/mdf/test.mdf', "rb")
    ----> 2 MDF(fs)
    
    File /opt/conda/lib/python3.10/site-packages/asammdf/mdf.py:292, in MDF.__init__(self, name, version, channels, **kwargs)
        289     do_close = True
        291 else:
    --> 292     name = original_name = Path(name)
        293     if not name.is_file() or not name.exists():
        294         raise MdfException(f'File "{name}" does not exist')
    
    File /opt/conda/lib/python3.10/pathlib.py:960, in Path.__new__(cls, *args, **kwargs)
        958 if cls is Path:
        959     cls = WindowsPath if os.name == 'nt' else PosixPath
    --> 960 self = cls._from_parts(args)
        961 if not self._flavour.is_supported:
        962     raise NotImplementedError("cannot instantiate %r on your system"
        963                               % (cls.__name__,))
    
    File /opt/conda/lib/python3.10/pathlib.py:594, in PurePath._from_parts(cls, args)
        589 @classmethod
        590 def _from_parts(cls, args):
        591     # We need to call _parse_args on the instance, so as to get the
        592     # right flavour.
        593     self = object.__new__(cls)
    --> 594     drv, root, parts = self._parse_args(args)
        595     self._drv = drv
        596     self._root = root
    
    File /opt/conda/lib/python3.10/pathlib.py:578, in PurePath._parse_args(cls, args)
        576     parts += a._parts
        577 else:
    --> 578     a = os.fspath(a)
        579     if isinstance(a, str):
        580         # Force-cast str subclasses to str (issue #21127)
        581         parts.append(str(a))
    
    TypeError: expected str, bytes or os.PathLike object, not AzureBlobFileSystem
    
    

    Code snippet 2

       import adlfs
       fs = adlfs.AzureBlobFileSystem(account_name="account_name", sas_token="sas_token")
       file = fs.open('apitest/mdf/test.mdf', "rb")
       MDF(file)
    

    Traceback.

    ---------------------------------------------------------------------------
    MdfException                              Traceback (most recent call last)
    Cell In [11], line 2
          1 file = fs.open('test/mdf/test.mdf', "rb")
    ----> 2 MDF(file)
    
    File /opt/conda/lib/python3.10/site-packages/asammdf/mdf.py:265, in MDF.__init__(self, name, version, channels, **kwargs)
        262         do_close = True
        264     else:
    --> 265         raise MdfException(
        266             f"{type(name)} is not supported as input for the MDF class"
        267         )
        269 elif isinstance(name, zipfile.ZipFile):
        271     archive = name
    
    MdfException: <class 'adlfs.spec.AzureBlobFile'> is not supported as input for the MDF class
    
    

    Description

    MDF class fails to identity files streamed from cloud stores as files. I've tested this with a file on azure blob store.

    A simple fix that works on my fork of this repo is by adding below to https://github.com/neerajd12/asammdf/commit/3bff61a84c9a9764310a0b332738c97d5e1d36aa

    from fsspec.spec import AbstractBufferedFile
    if isinstance(name, AbstractBufferedFile):
        original_name = None
        file_stream = name
        do_close = False
    if isinstance(name, BytesIO):
    
    

    this works for any/all file systems Supported with fsspec

    Hope this helps anyone using azure/aws etc

    opened by neerajd12 3
Releases(7.2.0)
Owner
Daniel Hrisca
Daniel Hrisca
Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

Jason R. Coombs 1k Dec 28, 2022
Utils for streaming large files (S3, HDFS, gzip, bz2...)

smart_open — utils for streaming large files in Python What? smart_open is a Python 3 library for efficient streaming of very large files from/to stor

RARE Technologies 2.7k Jan 06, 2023
Some-tasks - Files for some of the tasks for the group sessions

Files for some of the tasks for the group sessions Here you can find some of the

<a href=[email protected] Computer Networks"> 0 Aug 25, 2022
A tiny Python library for writing multi-channel TIFF stacks.

xtiff A tiny Python library for writing multi-channel TIFF stacks. The aim of this library is to provide an easy way to write multi-channel image stac

23 Dec 27, 2022
Various converters to convert value sets from CSV to JSON, etc.

ValueSet Converters Tools for converting value sets in different formats. Such as converting extensional value sets in CSV format to JSON format able

Health Open Terminology Ecosystem 4 Sep 08, 2022
A simple library for temporary storage of small files

TemporaryStorage An simple library for temporary storage of small files. Navigation Install Usage In Python console As a standalone application List o

2 Apr 17, 2022
A tool for batch processing large fasta files and accompanying metadata table to upload to repositories via API

Fasta Uploader A tool for batch processing large fasta files and accompanying metadata table to repositories via API The python fasta_uploader.py scri

Centre for Infectious Disease and One Health 1 Dec 09, 2021
PyDeleter - delete a specifically formatted file in a directory or delete all other files

PyDeleter If you want to delete a specifically formatted file in a directory or delete all other files, PyDeleter does it for you. How to use? 1- Down

Amirabbas Motamedi 1 Jan 30, 2022
Python codes for the server and client end that facilitates file transfers. (Using AWS EC2 instance as the server)

Server-and-Client-File-Transfer Python codes for the server and client end that facilitates file transfers. I will be using an AWS EC2 instance as the

Amal Farhad Shaji 2 Oct 13, 2021
The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

JareBear 2 Nov 20, 2021
Python's Filesystem abstraction layer

PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's

pyFilesystem 1.8k Jan 02, 2023
This is just a GUI that detects your file's real extension using the filetype module.

Real-file.extnsn This is just a GUI that detects your file's real extension using the filetype module. Requirements Python 3.4 and above filetype modu

1 Aug 08, 2021
A python script generate password files in plain text

KeePass (or any desktop pw manager?) Helper WARNING: This script will generate password files in plain text. ITS NOT SECURE. I needed help remembering

Eric Thomas 1 Nov 21, 2021
A python module to parse text files with contains secret variables.

A python module to parse text files with contains secret variables.

0 Dec 05, 2022
Test app for importing contact information in CSV files.

Contact Import TestApp Test app for importing contact information in CSV files. Explore the docs » · Report Bug · Request Feature Table of Contents Ab

1 Feb 06, 2022
An easy-to-use library for emulating code in minidump files.

dumpulator Note: This is a work-in-progress prototype, please treat it as such. An easy-to-use library for emulating code in minidump files. Example T

Duncan Ogilvie 362 Dec 31, 2022
CSV-Handler written in Python3

CSVHandler This code allows you to work intelligently with CSV files. A file in CSV syntax is converted into several lists, which are combined in a to

Max Tischberger 1 Jan 13, 2022
File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Tin Tvrtković 2.1k Jan 01, 2023
A simple bulk file renamer, written in python.

Python File Editor A simple bulk file renamer, written in python. There are two functions, the bulk rename and the bulk file extention change. Bulk Fi

Sam Bloomfield 2 Dec 22, 2021
pytiff is a lightweight library for reading chunks from a tiff file

pytiff is a lightweight library for reading chunks from a tiff file. While it supports other formats to some extend, it is focused on reading tiled greyscale/rgb images, that can also be bigtiffs. Wr

Big Data Analytics group 9 Mar 21, 2022