wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Overview

rtd ci codecov pyversions pypi pypistatus license coc codestyle

Python based Wikidata framework for easy dataframe extraction

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information. The goal is to create an intuitive interface so that Wikidata can function as a common read-write repository for public statistics.

Contents

Installation

wikirepo can be downloaded from PyPI via pip or sourced directly from this repository:

pip install wikirepo
git clone https://github.com/andrewtavis/wikirepo.git
cd wikirepo
python setup.py install
import wikirepo

Data

wikirepo's data structure is built around Wikidata.org. Human-readable access to Wikidata statistics is achieved through converting requests into Wikidata's Quantity IDs (QIDs) and Property IDs (PIDs), with the Python package wikidata serving as a basis for data loading and indexing. See the documentation for a structured overview of the currently available properties.

Query Data

wikirepo's main access function, wikirepo.data.query, returns a pandas.DataFrame of locations and property data across time.

Each query needs the following inputs:

  • locations: the locations that data should be queried for
    • Strings are accepted for Earth, continents, and countries
    • Get all country names with wikirepo.data.incl_lctn_lbls(lctn_lvls='country')
    • The user can also pass Wikidata QIDs directly
  • depth: the geographic level of the given locations to query
    • A depth of 0 is the locations themselves
    • Greater depths correspond to lower geographic levels (states of countries, etc.)
    • A dictionary of locations is generated for lower depths (see second example below)
  • timespan: start and end datetime.date objects defining when data should come from
    • If not provided, then the most recent data will be retrieved with annotation for when it's from
  • interval: yearly, monthly, weekly, or daily as strings
  • Further arguments: the names of modules in wikirepo/data directories
    • These are passed to arguments corresponding to their directories
    • Data will be queried for these properties for the given locations, depth, timespan and interval, with results being merged as dataframe columns

Queries are also able to access information in Wikidata sub-pages for locations. For example: if inflation rate is not found on the location's main page, then wikirepo checks the location's economic topic page as inflation_rate.py is found in wikirepo/data/economic (see Germany and economy of Germany).

wikirepo further provides a unique dictionary class, EntitiesDict, that stores all loaded Wikidata entities during a query. This speeds up data retrieval, as entities are loaded once and then accessed in the EntitiesDict object for any other needed properties.

Examples of wikirepo.data.query follow:

Querying Information for Given Countries

import wikirepo
from wikirepo.data import wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
# Strings must match their Wikidata English page names
countries = ["Germany", "United States of America", "People's Republic of China"]
# countries = ["Q183", "Q30", "Q148"] # we could also pass QIDs
# data.incl_lctn_lbls(lctn_lvls='country') # or all countries`
depth = 0
timespan = (date(2009, 1, 1), date(2010, 1, 1))
interval = "yearly"

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=countries,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props=["population", "life_expectancy"],
    economic_props="median_income",
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props=None,
    institutional_props="human_dev_idx",
    political_props="executive",
    misc_props=None,
    verbose=True,
)

col_order = [
    "location",
    "qid",
    "year",
    "executive",
    "population",
    "life_exp",
    "human_dev_idx",
    "median_income",
]
df = df[col_order]

df.head(6)
location qid year executive population life_exp human_dev_idx median_income
Germany Q183 2010 Angela Merkel 8.1752e+07 79.9878 0.921 33333
Germany Q183 2009 Angela Merkel nan 79.8366 0.917 nan
United States of America Q30 2010 Barack Obama 3.08746e+08 78.5415 0.914 43585
United States of America Q30 2009 George W. Bush nan 78.3902 0.91 nan
People's Republic of China Q148 2010 Wen Jiabao 1.35976e+09 75.236 0.706 nan
People's Republic of China Q148 2009 Wen Jiabao nan 75.032 0.694 nan

Querying Information for all US Counties

# Note: >3000 regions, expect a 45 minute runtime
import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
country = "United States of America"
# country = "Q30" # we could also pass its QID
depth = 2  # 2 for counties, 1 for states and territories
sub_lctns = True  # for all
# Only valid sub-locations given the timespan will be queried
timespan = (date(2016, 1, 1), date(2018, 1, 1))
interval = "yearly"

us_counties_dict = lctn_utils.gen_lctns_dict(
    ents_dict=ents_dict,
    locations=country,
    depth=depth,
    sub_lctns=sub_lctns,
    timespan=timespan,
    interval=interval,
    verbose=True,
)

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=us_counties_dict,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props="population",
    economic_props=None,
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props="area",
    institutional_props="capital",
    political_props=None,
    misc_props=None,
    verbose=True,
)

df[df["population"].notnull()].head(6)
location sub_lctn sub_sub_lctn qid year population area_km2 capital
United States of America California Alameda County Q107146 2018 1.6602e+06 2127 Oakland
United States of America California Contra Costa County Q108058 2018 1.14936e+06 2078 Martinez
United States of America California Marin County Q108117 2018 263886 2145 San Rafael
United States of America California Napa County Q108137 2018 141294 2042 Napa
United States of America California San Mateo County Q108101 2018 774155 1919 Redwood City
United States of America California Santa Clara County Q110739 2018 1.9566e+06 3377 San Jose

Upload Data (WIP)

wikirepo.data.upload will be the core of the eventual wikirepo upload feature. The goal is to record edits that a user makes to a previously queried or baseline dataframe such that these changes can then be pushed back to Wikidata. With the addition of Wikidata login credentials as a wikirepo feature (WIP), the unique information in the edited dataframe could then be uploaded to Wikidata for all to use.

The same process used to query information from Wikidata could be reversed for the upload process. Dataframe columns could be linked to their corresponding Wikidata properties, whether the time qualifiers are a point in time or spans using start time and end time could be derived through the defined variables in the module header, and other necessary qualifiers for proper data indexing could also be included. Source information could also be added in corresponding columns to the given property edits.

Pseudocode for how this process could function follows:

In the first example, changes are made to a df.copy() of a queried dataframe. pandas is then used to compare the new and original dataframes after the user has added information that they have access to.

import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

ents_dict = wd_utils.EntitiesDict()
country = "Country Name"
depth = 2
sub_lctns = True
timespan = (date(2000,1,1), date(2018,1,1))
interval = 'yearly'

lctns_dict = lctn_utils.gen_lctns_dict()

df = wikirepo.data.query()
df_copy = df.copy()

# The user checks for NaNs and adds data

df_edits = pd.concat([df, df_copy]).drop_duplicates(keep=False)

wikirepo.data.upload(df_edits, credentials)

In the next example data.data_utils.gen_base_df is used to create a dataframe with dimensions that match a time series that the user has access to. The data is then added to the column that corresponds to the property to which it should be added. Source information could further be added via a structured dictionary generated for the user.

import wikirepo
from wikirepo.data import data_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

locations = "Country Name"
depth = 0
# The user defines the time parameters based on their data
timespan = (date(1995,1,2), date(2010,1,2)) # (first Monday, last Sunday)
interval = 'weekly'

base_df = data_utils.gen_base_df()
base_df['data'] = data_for_matching_time_series

source_data = wd_utils.gen_source_dict('Source Information')
base_df['data_source'] = [source_data] * len(base_df)

wikirepo.data.upload(base_df, credentials)

Put simply: a full featured wikirepo.data.upload function would realize the potential of a single read-write repository for all public information.

Maps (WIP)

wikirepo/maps is a further goal of the project, as it combines wikirepo's focus on easy to access open source data and quick high level analytics.

Query Maps

As in wikirepo.data.query, passing the locations, depth, timespan and interval arguments could access GeoJSON files stored on Wikidata, thus providing mapping files in parallel to the user's data. These files could then be leveraged using existing Python plotting libraries to provide detailed presentations of geographic analysis.

Upload Maps

Similar to the potential of adding statistics through wikirepo.data.upload, GeoJSON map files could also be uploaded to Wikidata using appropriate arguments. The potential exists for a myriad of variable maps given locations, depth, timespan and interval information that would allow all wikirepo users to get the exact mapping file that they need for their given task.

Examples

wikirepo can be used as a foundation for countless projects, with its usefulness and practicality only improving as more properties are added and more data is uploaded to Wikidata.

Current usage examples include:

  • Sample notebooks for the Python package poli-sci-kit show how to use wikirepo as a basis for political election and parliamentary appointment analysis, with those notebooks being found in the examples for poli-sci-kit or on Google Colab
  • Pull requests with other examples will gladly be accepted

To-Do

Please see the contribution guidelines if you are interested in contributing to this project. Work that is in progress or could be implemented includes:

Expanding wikirepo

  • Creating an outline of the package's structure for the readme (see issue)

  • Integrating current Python tools with wikirepo structures for uploads to Wikidata

  • Adding a query of property descriptions to data.data_utils.incl_dir_idxs (see issue)

  • Adding multiprocessing support to the wikirepo.data.query process and data.lctn_utils.gen_lctns_dict

  • Potentially converting wikirepo.data.query and data.lctn_utils.gen_lctns_dict over to generated Wikidata SPARQL queries

  • Optimizing wikirepo.data.query:

    • Potentially converting EntitiesDict and LocationsDict to slotted object classes for memory savings
    • Deriving and optimizing other slow parts of the query process
  • Adding access to GeoJSON files for mapping via wikirepo.maps.query

  • Designing and adding GeoJSON files indexed by time properties to Wikidata

  • Creating, improving and sharing examples

  • Improving tests for greater code coverage

  • Improving code quality by refactoring large functions and checking conventions

Expanding Wikidata

The growth of wikirepo's database relies on that of Wikidata. Through data.wd_utils.dir_to_topic_page wikirepo can access properties on location sub-pages, thus allowing for statistics on any topic to be linked to. Beyond including entries for already existing properties (see this issue), the following are examples of property types that could be added:

  • Climate statistics could be added to data/climate

    • This would allow for easy modeling of global warming and its effects
    • Planning would be needed for whether lower intervals would be necessary, or just include daily averages
  • Those for electoral polling and results for locations

    • This would allow direct access to all needed election information in a single function call
  • A property that links political parties and their regions in data/political

    • For easy professional presentation of electoral results (ex: loading in party hex colors, abbreviations, and alignments)
  • data/demographic properties such as:

    • age, education, religious, and linguistic diversities across time
  • data/economic properties such as:

    • female workforce participation, workforce industry diversity, wealth diversity, and total working age population across time
  • Distinct properties for Freedom House and Press Freedom indexes, as well as other descriptive metrics

Similar Projects

Python

JavaScript

Java

Powered By


Wikimedia           Wikibase           Wikidata
Comments
  • Create concise requirement and env files

    Create concise requirement and env files

    This issue is for creating concise versions of requirements.txt and environment.yml for wikirepo. It would be great if these files were created by hand with specific version numbers or generated in a way so that sub-dependencies don't always need to be updated.

    As of now both files are being created with the following commands in the package's conda virtual environment:

    pip list --format=freeze > requirements.txt  
    conda env export --no-builds | grep -v "^prefix: " > environment.yml
    

    wikirepo and other obviously unneeded packages are then removed from these files before being uploaded.

    Any insights or help would be much appreciated!

    help wanted good first issue question 
    opened by andrewtavis 7
  • Remove unused packages in requirements

    Remove unused packages in requirements

    Hello, This is to follow-up issue https://github.com/andrewtavis/wikirepo/issues/17.

    Please review~

    And about setup.py, is there some purpose to use graph package, such as matplotlib and seaborn?

    opened by kination 2
  • Bump aiohttp from 3.7.3 to 3.7.4

    Bump aiohttp from 3.7.3 to 3.7.4

    Bumps aiohttp from 3.7.3 to 3.7.4.

    Changelog

    Sourced from aiohttp's changelog.

    3.7.4 (2021-02-25)

    Bugfixes

    • (SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

      Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooij <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

    • Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

      Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_


    Commits
    • 0a26acc Bump aiohttp to v3.7.4 for a security release
    • 021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master
    • 4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)
    • b61f0fd Fix how pure-Python HTTP parser interprets //
    • 5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)
    • 0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)
    • 5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)
    • 5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)
    • 6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)
    • c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump lxml from 4.6.2 to 4.6.3

    Bump lxml from 4.6.2 to 4.6.3

    Bumps lxml from 4.6.2 to 4.6.3.

    Changelog

    Sourced from lxml's changelog.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • [ImgBot] Optimize images

    [ImgBot] Optimize images

    Beep boop. Your images are optimized!

    Your image file size has been reduced by 45% 🎉

    Details

    | File | Before | After | Percent reduction | |:--|:--|:--|:--| | /resources/wikirepo_logo_transparent.png | 171.28kb | 76.11kb | 55.56% | | /resources/gh_images/wikidata_logo.png | 26.59kb | 16.87kb | 36.56% | | /resources/wikirepo_logo.png | 150.90kb | 96.30kb | 36.18% | | /resources/gh_images/wikibase_logo.png | 20.41kb | 14.64kb | 28.30% | | | | | | | Total : | 369.18kb | 203.92kb | 44.76% |


    Black Lives Matter | 💰 donate | 🎓 learn | ✍🏾 sign

    📝 docs | :octocat: repo | 🙋🏾 issues | 🏅 swag | 🏪 marketplace

    opened by imgbot[bot] 1
  • Bump aiohttp from 3.7.3 to 3.7.4

    Bump aiohttp from 3.7.3 to 3.7.4

    Bumps aiohttp from 3.7.3 to 3.7.4.

    Changelog

    Sourced from aiohttp's changelog.

    3.7.4 (2021-02-25)

    Bugfixes

    • (SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

      Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooij <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

    • Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

      Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_


    Commits
    • 0a26acc Bump aiohttp to v3.7.4 for a security release
    • 021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master
    • 4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)
    • b61f0fd Fix how pure-Python HTTP parser interprets //
    • 5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)
    • 0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)
    • 5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)
    • 5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)
    • 6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)
    • c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Create package structure outline

    Create package structure outline

    wikirepo as a project has many modules that interconnect and are funneled to two functions - wikirepo.data.query and lctn_utils.gen_lctns_dict. It would be helpful for users and potential contributors to have a visual representation of the package that details the overarching structure and the purpose of various components. This outline could then be added to the readme in the To-Do section, potentially in a drop down.

    An initial test of this could be as simple as a directory outline that has a bit more detail about the given components - say by using *, **, †, ‡ and other symbols to indicate where a description could be found.

    A discussion of how to best present the package structure is more than welcome, and contributions would further be very appreciated!

    documentation good first issue question 
    opened by andrewtavis 0
  • Suggest properties for wikirepo

    Suggest properties for wikirepo

    Please use this issue to suggest Wikidata properties that could be added to wikirepo. With the suggestion it would be great to get the following:

    • The link to the property page on Wikidata
    • A suggestion of which category (demographic, economic, etc) the property should go into
    • [Optional] how the query script should be written (see examples/add_property to make suggestions for how the module header should be structured)

    Accepted property suggestions would then be converted to good first issues for wikirepo. Pull requests with new properties following the process of examples/add_property would also gladly be accepted! Documentation could also be done fur such issues or PRs, or could also be a separate issue.

    Thanks for your interest in supporting this project :)

    good first issue question 
    opened by andrewtavis 2
  • Add descriptions to data.data_utils.incl_dir_idxs

    Add descriptions to data.data_utils.incl_dir_idxs

    The function data.data_utils.incl_dir_idxs is how a user can find what indexes are available for a given type of data - demographic, economic, etc. It would be great if data.data_utils.incl_dir_idxs would have an option to also provide a description for the index. This could be directly queried from Wikidata.

    enhancement good first issue 
    opened by andrewtavis 0
Releases(v1.0.0)
  • v1.0.0(Dec 28, 2021)

  • v0.1.1.5(Mar 28, 2021)

    Changes include:

    • An src structure has been adopted for easier testing and to fix wheel distribution issues
    • Code quality is now checked with Codacy
    • Extensive code formatting to improve quality and style
    • Fixes to vulnerabilities through exception use
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Feb 23, 2021)

    First stable release of wikirepo

    Changes include:

    • Full documentation of the package

    • Virtual environment files

    • Bug fixes

    • Extensive testing of all modules with GH Actions and Codecov

    • Code of conduct and contribution guidelines

    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Dec 8, 2020)

    The minimum viable product of wikirepo:

    • Users are able to query data from Wikidata given locations, depth, time_lvl, and timespan arguments

    • String arguments are accepted for Earth, continents, countries and disputed territories

    • Data for greater depths can be retrieved by creating a dictionary given initial starting locations and going to greater depths using the contains administrative territorial entity property

    • Data is formatted and loaded into a pandas dataframe for further manipulation

    • All available social science properties on Wikidata have had modules created for them

    • Estimated load times and progress are given

    • The project's scope and general roadmap have been defined and detailed in the README

    Source code(tar.gz)
    Source code(zip)
Owner
Andrew Tavis McAllister
Data scientist focussing on NLP, causal inference and recommendation engines. Humboldt University of Berlin (MS); University of Oregon (BA).
Andrew Tavis McAllister
Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

Mohammed Hassan El Sayed 1 Jan 24, 2022
.npy, .npz, .mtx converter.

npy-converter Matrix Data Converter. Expand matrix for multi-thread, multi-process Divid matrix for multi-thread, multi-process Support: .mtx, .npy, .

taka 1 Feb 07, 2022
A multi-platform GUI for bit-based analysis, processing, and visualization

A multi-platform GUI for bit-based analysis, processing, and visualization

Mahlet 529 Dec 19, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 09, 2023
A Python package for modular causal inference analysis and model evaluations

Causal Inference 360 A Python package for inferring causal effects from observational data. Description Causal inference analysis enables estimating t

International Business Machines 506 Dec 19, 2022
small package with utility functions for analyzing (fly) calcium imaging data

fly2p Tools for analyzing two-photon (2p) imaging data collected with Vidrio Scanimage software and micromanger. Loading scanimage data relies on scan

Hannah Haberkern 3 Dec 14, 2022
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 09, 2021
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 1.6k Dec 29, 2022
Display the behaviour of a realtime program with a scope or logic analyser.

1. A monitor for realtime MicroPython code This library provides a means of examining the behaviour of a running system. It was initially designed to

Peter Hinch 17 Dec 05, 2022
scikit-survival is a Python module for survival analysis built on top of scikit-learn.

scikit-survival scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizi

Sebastian Pölsterl 876 Jan 04, 2023
💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Chatistics Python 3 scripts to convert chat logs from various messaging platforms into Pandas DataFrames. Can also generate histograms and word clouds

Florian 893 Jan 02, 2023
First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

Alexander Butler 150 Jan 06, 2023
A neural-based binary analysis tool

A neural-based binary analysis tool Introduction This directory contains the demo of a neural-based binary analysis tool. We test the framework using

Facebook Research 208 Dec 22, 2022
Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.

Amber Electric Usage Summary This is a command line tool that produces a summary CSV report of an Amber Electric customer's energy consumption and cos

Graham Lea 12 May 26, 2022
PyEmits, a python package for easy manipulation in time-series data.

PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life. Engineering FSI industry (Financial

Thompson 5 Sep 23, 2022
ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

ToeholdTools Category Status Repository Package Build Quality A library for the analysis of toehold switch riboregulators created by the iGEM team Cit

0 Dec 01, 2021
Port of dplyr and other related R packages in python, using pipda.

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from t

179 Dec 21, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Data imputations library to preprocess datasets with missing data

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

Elton Law 329 Dec 05, 2022
Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

PizzaOrders_DataPipeline There is a Tony who is owning a New Pizza shop. He knew that pizza alone was not going to help him get seed funding to expand

Melwin Varghese P 4 Jun 05, 2022