Convert CSV files into a SQLite database

Overview

csvs-to-sqlite

PyPI Changelog Tests License

Convert CSV files into a SQLite database. Browse and publish that SQLite database with Datasette.

Basic usage:

csvs-to-sqlite myfile.csv mydatabase.db

This will create a new SQLite database called mydatabase.db containing a single table, myfile, containing the CSV content.

You can provide multiple CSV files:

csvs-to-sqlite one.csv two.csv bundle.db

The bundle.db database will contain two tables, one and two.

This means you can use wildcards:

csvs-to-sqlite ~/Downloads/*.csv my-downloads.db

If you pass a path to one or more directories, the script will recursively search those directories for CSV files and create tables for each one.

csvs-to-sqlite ~/path/to/directory all-my-csvs.db

Handling TSV (tab-separated values)

You can use the -s option to specify a different delimiter. If you want to use a tab character you'll need to apply shell escaping like so:

csvs-to-sqlite my-file.tsv my-file.db -s $'\t'

Refactoring columns into separate lookup tables

Let's say you have a CSV file that looks like this:

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(Real example taken from the Open Elections project)

You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

The format is as follows:

column_name:optional_table_name:optional_table_value_column_name

If you just specify the column name e.g. -c office, the following table will be created:

CREATE TABLE "office" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

The original tables will be created like this:

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

They will be populated with IDs that reference the new derived tables.

Installation

$ pip install csvs-to-sqlite

csvs-to-sqlite now requires Python 3. If you are running Python 2 you can install the last version to support Python 2:

$ pip install csvs-to-sqlite==0.9.2

csvs-to-sqlite --help

Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME

  PATHS: paths to individual .csv files or to directories containing .csvs

  DBNAME: name of the SQLite database file to create

Options:
  -s, --separator TEXT            Field separator in input .csv
  -q, --quoting INTEGER           Control field quoting behavior per csv.QUOTE_*
                                  constants. Use one of QUOTE_MINIMAL (0),
                                  QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or
                                  QUOTE_NONE (3).

  --skip-errors                   Skip lines with too many fields instead of
                                  stopping the import

  --replace-tables                Replace tables if they already exist
  -t, --table TEXT                Table to use (instead of using CSV filename)
  -c, --extract-column TEXT       One or more columns to 'extract' into a
                                  separate lookup table. If you pass a simple
                                  column name that column will be replaced with
                                  integer foreign key references to a new table
                                  of that name. You can customize the name of
                                  the table like so:     state:States:state_name
                                  
                                  This will pull unique values from the 'state'
                                  column and use them to populate a new 'States'
                                  table, with an id column primary key and a
                                  state_name column containing the strings from
                                  the original column.

  -d, --date TEXT                 One or more columns to parse into ISO
                                  formatted dates

  -dt, --datetime TEXT            One or more columns to parse into ISO
                                  formatted datetimes

  -df, --datetime-format TEXT     One or more custom date format strings to try
                                  when parsing dates/datetimes

  -pk, --primary-key TEXT         One or more columns to use as the primary key
  -f, --fts TEXT                  One or more columns to use to populate a full-
                                  text index

  -i, --index TEXT                Add index on this column (or a compound index
                                  with -i col1,col2)

  --shape TEXT                    Custom shape for the DB table - format is
                                  csvcol:dbcol(TYPE),...

  --filename-column TEXT          Add a column with this name and populate with
                                  CSV file name

  --fixed-column 
   
    ...   Populate column with a fixed string
  --fixed-column-int 
    
     ...
                                  Populate column with a fixed integer
  --fixed-column-float 
     
      ...
                                  Populate column with a fixed float
  --no-index-fks                  Skip adding index to foreign key columns
                                  created using --extract-column (default is to
                                  add them)

  --no-fulltext-fks               Skip adding full-text index on values
                                  extracted using --extract-column (default is
                                  to add them)

  --just-strings                  Import all columns as text strings by default
                                  (and, if specified, still obey --shape,
                                  --date/datetime, and --datetime-format)

  --version                       Show the version and exit.
  --help                          Show this message and exit.


     
    
   
Comments
  • No create index option - please create new release

    No create index option - please create new release

    Hi Simon,

    Love this wonderful tool! Thanks a million for using it and prompting sqlite!

    The README shows an option for creating an index: https://github.com/simonw/csvs-to-sqlite/blob/master/README.md

    -i, --index TEXT Add index on this column (or a compound index with -i col1,col2)

    I don't seem to have that version in my csvs-to-sqlite:

    % csvs-to-sqlite --help
    Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME
    
      PATHS: paths to individual .csv files or to directories containing .csvs
    
      DBNAME: name of the SQLite database file to create
    
    Options:
      -s, --separator TEXT       Field separator in input .csv
      --replace-tables           Replace tables if they already exist
      -c, --extract-column TEXT  One or more columns to 'extract' into a separate
                                 lookup table. If you pass a simple column name
                                 that column will be replaced with integer foreign
                                 key references to a new table of that name. You
                                 can customize the name of the table like so:
    
                                     --extract-column state:States:state_name
    
                                 This will pull unique values from the 'state'
                                 column and use them to populate a new 'States'
                                 table, with an id column primary key and a
                                 state_name column containing the strings from the
                                 original column.
      -f, --fts TEXT             One or more columns to use to populate a full-
                                 text index
      --version                  Show the version and exit.
      --help                     Show this message and exit.
    
    % csvs-to-sqlite --version
    csvs-to-sqlite, version 0.7
    

    Is this an option in an unreleased build of csvs-to-sqlite? If so, when do you think it would be ready?

    Thanks!

    opened by jungle-boogie 7
  • Add support for loading CSVs directly from URLs

    Add support for loading CSVs directly from URLs

    Closes #4

    This adds a new parameter type to support paths or URLs on the CLI.

    Something like csvs-to-sqlite https://data.stadt-zuerich.ch/dataset/wirtschaft_preise_ziw_basis2010/resource/54c7ae15-f673-41af-a7cd-b861b8c6744e/download/ziwbasis2010apr2010apr2016.csv blah.db now works.

    opened by betatim 6
  • Ability to merge multiple CSV files into a single table (and store CSV filename in a column)

    Ability to merge multiple CSV files into a single table (and store CSV filename in a column)

    If you feed the tool a number of CSVs there should be an option to have them all go into the same database table, optionally with a column that gets populated with the name of the CSV file:

    csvs-to-sqlite ~/openelections-data/ --table=combined_table --filename-column=source
    
    opened by simonw 5
  • Allow appending files to an existing SQLite database

    Allow appending files to an existing SQLite database

    Tool currently quits with an error if you try this.

    Should it be possible to update existing tables with new data? Not sure about this. We certainly can’t handle schema changes. We would need to be told a column to treat as a “primary key”.

    Given how cheap it is to recreate the database from scratch I’m inclined to say it’s not worth bothering with table updates.

    opened by simonw 5
  • add

    add "fixed-column" option

    For colname:value, add a column colname and populate with value.

    The use case here is that I have a snakemake workflow that produces a lot of csv files with a structure like: samples/<sample_id>/<condition_name>/this.csv I want to import them all into the same table, where sample_id and condition_name are columns. I tried to match how --filename_column works, but allow for multiple new columns to be added.

    I haven't yet added any tests, but if you think this option would be a useful addition, I will add some tests to cover the basic use case and the combination with shape.

    opened by williamrowell 4
  • Interpret all columns as TEXT data type

    Interpret all columns as TEXT data type

    Is it possible to interpret all columns as TEXT datatype (through a flag maybe?) I think the columns values are sampled and then column datatype is guessed. If there is an incompatible value in some row, then that row seems to be skipped. So instead, is it possible I load everything as TEXT data type? I just need the data in some format(for data comparison purposes), But all csv data must go into the table without skipping rows.

    Thanks Thyag

    opened by tsundara 4
  • csvs-to-sqlite has a hard requirement on package versions.

    csvs-to-sqlite has a hard requirement on package versions.

    Currently version required is hardcoded:

    https://github.com/simonw/csvs-to-sqlite/blob/dccbf65b37bc9eed50e9edb80a42f257e93edb1f/setup.py#L22-L28

    The latest Pandas version is 0.21.0, this makes csvs-to-sqlite unusable with it by default and unusable with any package that requires the latest pandas.

    Is there a reason why a specific version is enforced and not "0.20.3 or later"?

    opened by mratsim 4
  • ".\" added to table name in SQLITE database file

    Hello Simon, thanks a lot for your great tool and sharing.

    I am using it to convert a bunch of csv files in a folder. i get a .db file as output; anyhow my table names are marked with a "."chars which do not allow me to perform further sqlite queries...could you please advice me how to avoid this issue? Below my cmd input and output: C:~Documents\Profesional\2018-02-15>csvs-to-sqlite C:~Documents\Profesional\2018-02-15 output.db extract_columns=() Loaded 4 dataframes c:\users\joel0\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\generic.py:1362: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores. chunksize=chunksize, dtype=dtype) Created output.db from 4 CSV files

    thanks in advance

    opened by joel0977 4
  • --fts list-of-columns option

    --fts list-of-columns option

    Creates a FTS5 table populated with the content of the specified columns: https://sqlite.org/fts5.html (falls back on FTS4 if FTS5 is not supported)

    Would make it trivial to add fulltext search to data from a CSV file.

    opened by simonw 4
  • Columns are typed

    Columns are typed "REAL" if they are integers with some NaN/blanks

    This is bad. If a column has all integers and some blanks it should result in an INTEGER.

    Example: this CSV https://github.com/openelections/openelections-data-ca/blob/master/2016/20161108__ca__general__yolo__precinct.csv produces this SQL table:

    CREATE TABLE "2016/20161108__ca__general__yolo__precinct" (
    "county" TEXT,
      "precinct" INTEGER,
      "office" INTEGER,
      "district" REAL,
      "party" REAL,
      "candidate" INTEGER,
      "votes" INTEGER
    ,
    FOREIGN KEY (county) REFERENCES county(id),
        FOREIGN KEY (party) REFERENCES party(id),
        FOREIGN KEY (precinct) REFERENCES precinct(id),
        FOREIGN KEY (office) REFERENCES office(id),
        FOREIGN KEY (candidate) REFERENCES candidate(id))
    
    opened by simonw 4
  • --shape option for specifying the

    --shape option for specifying the "shape" of the resulting table

    This option will allow you to tell the command exactly what columns should be created in the new table and what their types should be.

    For example:

    csvs-to-sqlite votes.csv votes.db --shape "county:Cty,votes:Vts(REAL)"
    

    This will produce a table with just two columns: Cty and Vts. Those columns will correspond to the county and votes columns in the original CSV.

    The Cty column will use the default type detected by pandas - but the Vts column will be forced to be a REAL column instead.

    opened by simonw 3
  • Column `filing_id` not found despite its manifest existence

    Column `filing_id` not found despite its manifest existence

    I am trying to load 18 CSV files from CorpWatch into a SQLite database using csvs-to-sqlite. I will spare you the deprecation warnings for error_bad_lines.

    michael$ time /Users/michael/Library/Python/3.8/bin/csvs-to-sqlite ~/Downloads/corpwatch_api_tables_csv ~/Temp/corpwatch.db -s $'\t'
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (8,10) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (17) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.
    
      return pd.read_csv(
    /Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (3,6,7) have mixed types. Specify dtype option on import or set low_memory=False.
      return pd.read_csv(
    Loaded 18 dataframes
    Traceback (most recent call last):
      File "/Users/michael/Library/Python/3.8/bin/csvs-to-sqlite", line 8, in <module>
        sys.exit(cli())
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/cli.py", line 251, in cli
        df.to_sql(df.table_name, conn, if_exists="append", index=False)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/core/generic.py", line 2951, in to_sql
        return sql.to_sql(
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 697, in to_sql
        return pandas_sql.to_sql(
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 2190, in to_sql
        return table.insert(chunksize, method)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 950, in insert
        num_inserted = exec_insert(conn, keys, chunk_iter)
      File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 1896, in _execute_insert
        conn.executemany(self.insert_statement(num_rows=1), data_list)
    sqlite3.OperationalError: table ./company_filings has no column named filing_id
    
    real	1m30.052s
    user	1m12.763s
    sys	0m16.640s
    

    So the complaint is about company_filings.csv not having a filing_id. Yet:

    michael$ head ~/Downloads/corpwatch_api_tables_csv/company_filings.csv filing_id cik year quarter period_of_report filing_date form_10k_url sec_21_url 47 1000180 2008 1 20071230 2008-02-25 http://www.sec.gov/Archives/edgar/data/1000180/0000950134-08-003259.txt http://www.sec.gov/Archives/edgar/data/1000180/000095013408003259/f38194exv21w1.htm 104 1000209 2008 1 20071231 2008-03-13 http://www.sec.gov/Archives/edgar/data/1000209/0001193125-08-055644.txt http://www.sec.gov/Archives/edgar/data/1000209/000119312508055644/dex211.htm 121 1000228 2008 1 20071229 2008-02-26 http://www.sec.gov/Archives/edgar/data/1000228/0000950123-08-002119.txt http://www.sec.gov/Archives/edgar/data/1000228/000095012308002119/y50229exv21w1.htm 152 1000229 2008 1 0 2008-02-22 http://www.sec.gov/Archives/edgar/data/1000229/0001000229-08-000005.txt NULL 174 1000230 2008 1 0 2008-01-29 http://www.sec.gov/Archives/edgar/data/1000230/0001193125-08-014289.txt NULL 186 1000232 2008 1 20071231 2008-03-31 http://www.sec.gov/Archives/edgar/data/1000232/0001000232-08-000005.txt http://www.sec.gov/Archives/edgar/data/1000232/000100023208000005/exhibit21subsidiaries.txt 213 1000234 2008 1 20070924 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000003.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000003/ex21.htm 214 1000234 2008 1 20071231 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000004.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000004/ex21.htm 462 1000623 2008 1 20071231 2008-03-07 http://www.sec.gov/Archives/edgar/data/1000623/0001047469-08-002365.txt http://www.sec.gov/Archives/edgar/data/1000623/000104746908002365/a2183428zex-21.htm

    
    Why is this column not being found?
    opened by MichaelTiemannOSC 1
  • low_memory=False not a valid option

    low_memory=False not a valid option

    Hi, csvs-to-sqlite suggests to use low_memory=False, but there is no way to use such option from the command line, and it is not mentioned in the documentation.

    /home/user/.local/pipx/venvs/csvs-to-sqlite/lib/python3.8/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (14) have mixed types. Specify dtype option on import or set low_memory=False.
      return pd.read_csv(
    

    https://github.com/simonw/csvs-to-sqlite/blob/a8a37a016790dc93270c74e32d0a5051bc5a0f4d/csvs_to_sqlite/utils.py#L43

    opened by frafra 0
  • error_bad_lines argument has been deprecated

    error_bad_lines argument has been deprecated

    I get a deprecation warning on the latest stable.

    csvs_to_sqlite/utils.py:38: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. Use on_bad_lines in the future.
    
    opened by frafra 0
  • option to lower case column names, replace space with underscore, and remove reservered character from column names

    option to lower case column names, replace space with underscore, and remove reservered character from column names

    it's generally easier to if the column names of a sqlite table are lower case, have no spaces, and have no reserved characters. It would be nice if there was an option in this tool to make transform the columns names thus.

    opened by fgregg 0
  • Optionally maintain checksums of CSV files for faster updates

    Optionally maintain checksums of CSV files for faster updates

    Wanted to see if there is interest in a patch that helps speed up our workflows significantly, or if there are any further ideas for improving on such a feature. If this is out of scope for this project, I'm happy to continue maintaining my fork of this project.

    Use Case

    We currently maintain a folder of >200 CSV files with a total of a few hundred megabytes, and have a CI step that builds these CSVs into a sqlite database. These CSV files get updated 2-3 times a day, but only small changes are made to them. Currently, running csvs-to-sqlite with the --replace-tables flag takes roughly 6-7 minutes, which is too long for our use case.

    Solution

    Add a --update-tables flag that maintains a checksum hash of each CSV file in a table called .csvs-meta (happy to change this or make it configurable), and only reads the csv and loads the dataframe if the checksum has changed.

    Forked Version Here

    opened by dkaoster 0
Releases(1.3)
  • 1.3(Nov 18, 2021)

    • New options for populating a fixed column - so every inserted row will have the same column with the same value. Options are --fixed-column column-name string-value, --fixed-column-int column-name integer-value and --fixed-column-float column-name float-value. Thanks, William Rowell. #81
    Source code(tar.gz)
    Source code(zip)
  • 1.2(Nov 3, 2020)

  • 1.1(Aug 9, 2020)

  • 1.0(Aug 3, 2019)

  • 0.9.2(Jul 3, 2019)

  • 0.9.1(Jun 24, 2019)

  • 0.9(Jan 17, 2019)

    • Support for loading CSVs directly from URLs, thanks @betatim - #38
    • New -pk/--primary-key options, closes #22
    • Create FTS index for extracted column values
    • Added --no-fulltext-fks option, closes #32
    • Now using black for code formatting
    • Bumped versions of dependencies
    Source code(tar.gz)
    Source code(zip)
  • 0.8(Apr 24, 2018)

    • -d and -df options for specifying date/datetime columns, closes #33

    • Maintain lookup tables in SQLite, refs #17

    • --index option to specify which columns to index, closes #24

    • Test confirming --shape and --filename-column and -c work together #25

    • Use usecols when loading CSV if shape specified

    • --filename-column is now compatible with --shape, closes #10

    • --no-index-fks option

      By default, csvs-to-sqlite creates an index for every foreign key column that is added using the --extract-column option.

      For large tables, this can dramatically increase the size of the resulting database file on disk. The new --no-index-fks option allows you to disable this feature to save on file size.

      Refs #24 which will allow you to explicitly list which columns SHOULD have an index created.

    • Added --filename-column option, refs #10

    • Fixes for Python 2, refs #25

    • Implemented new --shape option - refs #25

    • --table option for specifying table to write to, refs #10

    • Updated README to cover --skip-errors, refs #20

    • Add --skip-errors option (#20) [Jani Monoses]

    • Less verbosity (#19) [Jani Monoses]

      Only log extract_columns info when that option is passed.

    • Add option for field quoting behaviour (#15) [Jani Monoses]

    Source code(tar.gz)
    Source code(zip)
  • 0.7(Nov 26, 2017)

  • 0.6.1(Nov 25, 2017)

  • 0.6(Nov 24, 2017)

    SQLite full-text search support

    • Added --fts option for setting up SQLite full-text search.

      The --fts option will create a corresponding SQLite FTS virtual table, using the best available version of the FTS module.

      https://sqlite.org/fts5.html https://www.sqlite.org/fts3.html

      Usage:

      csvs-to-sqlite my-csv.csv output.db -f column1 -f column2
      

      Example generated with this option: https://sf-trees-search.now.sh/

      Example search: https://sf-trees-search.now.sh/sf-trees-search-a899b92?sql=select+*+from+Street_Tree_List+where+rowid+in+%28select+rowid+from+Street_Tree_List_fts+where+Street_Tree_List_fts+match+%27grove+london+dpw%27%29%0D%0A

      Will be used in https://github.com/simonw/datasette/issues/131

    • --fts and --extract-column now cooperate.

      If you extract a column and then specify that same column in the --fts list, csvs-to-sqlite now uses the original value of that column in the index.

      Example using CSV from https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq

      csvs-to-sqlite Street_Tree_List.csv trees-fts.db \
          -c qLegalStatus -c qSpecies -c qSiteInfo \
          -c PlantType -c qCaretaker -c qCareAssistant \
          -f qLegalStatus -f qSpecies -f qAddress \
          -f qSiteInfo -f PlantType -f qCaretaker \
          -f qCareAssistant -f PermitNotes
      

      Closes #9

    • Handle column names with spaces in them.

    • Added csvs-to-sqlite --version option.

      Using http://click.pocoo.org/5/api/#click.version_option

    Source code(tar.gz)
    Source code(zip)
  • 0.5(Nov 19, 2017)

    Now handles columns with integers and nulls in correctly

    Pandas does a good job of figuring out which SQLite column types should be used for a DataFrame - with one exception: due to a limitation of NumPy it treats columns containing a mixture of integers and NaN (blank values) as being of type float64, which means they end up as REAL columns in SQLite.

    http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

    To fix this, we now check to see if a float64 column actually consists solely of NaN and integer-valued floats (checked using v.is_integer() in Python). If that is the case, we over-ride the column type to be INTEGER instead.

    See #5 - also a8ab524 and 0997b7b

    Source code(tar.gz)
    Source code(zip)
  • 0.3(Nov 17, 2017)

    • Mechanism for converting columns into separate tables

      Let's say you have a CSV file that looks like this:

      county,precinct,office,district,party,candidate,votes
      Clark,1,President,,REP,John R. Kasich,5
      Clark,2,President,,REP,John R. Kasich,0
      Clark,3,President,,REP,John R. Kasich,7
      

      (Real example from https://github.com/openelections/openelections-data-sd/blob/ master/2016/20160607__sd__primary__clark__precinct.csv )

      You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

      csvs-to-sqlite openelections-data-*/*.csv \
          -c county:County:name \
          -c precinct:Precinct:name \
          -c office -c district -c party -c candidate \
          openelections.db
      

      The format is as follows:

      column_name:optional_table_name:optional_table_value_column_name
      

      If you just specify the column name e.g. -c office, the following table will be created:

      CREATE TABLE "party" (
          "id" INTEGER PRIMARY KEY,
          "value" TEXT
      );
      

      If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

      CREATE TABLE "Precinct" (
          "id" INTEGER PRIMARY KEY,
          "name" TEXT
      );
      

      The original tables will be created like this:

      CREATE TABLE "ca__primary__san_francisco__precinct" (
          "county" INTEGER,
          "precinct" INTEGER,
          "office" INTEGER,
          "district" INTEGER,
          "party" INTEGER,
          "candidate" INTEGER,
          "votes" INTEGER,
          FOREIGN KEY (county) REFERENCES County(id),
          FOREIGN KEY (party) REFERENCES party(id),
          FOREIGN KEY (precinct) REFERENCES Precinct(id),
          FOREIGN KEY (office) REFERENCES office(id),
          FOREIGN KEY (candidate) REFERENCES candidate(id)
      );
      

      They will be populated with IDs that reference the new derived tables.

      Closes #2

    Source code(tar.gz)
    Source code(zip)
Owner
Simon Willison
Simon Willison
Python interface for reading and appending tar files

Python interface for reading and appending tar files, while keeping a fast index for finding and reading files in the archive. This interface has been

Lawrence Livermore National Laboratory 1 Nov 12, 2021
A Python script to organize your files in a given directory.

File-Organizer A Python script to organize your files in a given directory. It organizes your files based on the file extension and moves them into sp

Imira Randeniya 1 Sep 11, 2022
Publicly Open Amazon AWS S3 Bucket Viewer

S3Viewer Publicly open storage viewer (Amazon S3 Bucket, Azure Blob, FTP server, HTTP Index Of/) s3viewer is a free tool for security researchers that

Sharon Brizinov 377 Dec 02, 2022
Python script for converting figma produced SVG files into C++ JUCE framework source code

AutoJucer Python script for converting figma produced SVG files into C++ JUCE framework source code Watch the tutorial here! Getting Started Make some

SuperConductor 1 Nov 26, 2021
A simple file sharing tool written in python

Share it A simple file sharing tool written in python Installation If you are using Windows os you can directly Run .exe file -- download If you are

Sachit Yadav 7 Dec 16, 2022
CredSweeper is a tool to detect credentials in any directories or files.

CredSweeper is a tool to detect credentials in any directories or files. CredSweeper could help users to detect unwanted exposure of credentials (such as personal information, token, passwords, api k

Samsung 54 Dec 13, 2022
An easy-to-use library for emulating code in minidump files.

dumpulator Note: This is a work-in-progress prototype, please treat it as such. An easy-to-use library for emulating code in minidump files. Example T

Duncan Ogilvie 362 Dec 31, 2022
Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series.

Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series. The Fourier series can be animated and visualized, the function can be output as a tw

Alexander 12 Jan 01, 2023
Simple Python File Manager

This script lets you automatically relocate files based on their extensions. Very useful from the downloads folder !

Aimé Risson 22 Dec 27, 2022
A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".

the problem What directory should your app use for storing user data? If running on macOS, you should use: ~/Library/Application Support/AppName If

ActiveState Software 948 Dec 31, 2022
This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it

This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it. In the current state, it outputs a PrettyTable to txt file as

Joshua Wren 1 Nov 09, 2021
Read and write TIFF files

Read and write TIFF files Tifffile is a Python library to store numpy arrays in TIFF (Tagged Image File Format) files, and read image and metadata fro

Christoph Gohlke 346 Dec 18, 2022
ValveVMF - A python library to parse Valve's VMF files

ValveVMF ValveVMF is a Python library for parsing .vmf files for the Source Engi

pySourceSDK 2 Jan 02, 2022
Various technical documentation, in electronically parseable format

a-pile-of-documentation Various technical documentation, in electronically parseable format. You will need Python 3 to run the scripts and programs in

Jonathan Campbell 2 Nov 20, 2022
A python script to pull the transactions of an Algorand wallet and put them into a CSV file.

AlgoCSV A python script to pull the transactions of an Algorand wallet and put them into a CSV file. Dependancies: Requests Main features: Groups: Com

21 Jun 25, 2022
A python script to convert an ucompressed Gnucash XML file to a text file for Ledger and hledger.

README 1 gnucash2ledger gnucash2ledger is a Python script based on the Github Gist by nonducor (nonducor/gcash2ledger.py). This Python script will tak

Thomas Freeman 0 Jan 28, 2022
Nmap XML output to CSV and HTTP/HTTPS URLS.

xml-to-csv-url Convert NMAP's XML output to CSV file and print URL addresses for HTTP/HTTPS ports. NOTE: OS Version Parsing is not working properly ye

1 Dec 21, 2021
fast change directory with python and ruby

fcdir fast change directory with python and ruby run run python script , chose drirectoy and change your directory need you need python and ruby deskt

XCO 2 Jun 20, 2022
Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

laojunjun 13 Nov 23, 2022
useful files for the Freenove Big Hexapod

FreenoveBigHexapod useful files for the Freenove Big Hexapod HexaDogPos is a utility for converting the Freenove xyz co-ordinate system to servo angle

Alex 2 May 28, 2022