Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

Overview

Corset

Logo

License

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data. So, if you don't need the whole corpus, but just a suitable subset (indeed, a cor(pus sub)set, this is what Corset will do for you--and the reason of the name of the tool.

Here are some highlights of what you will find in Corset:

Millions of parallel sentences to explore

  • Dive into parallel corpora performing searches at the speed of light.
  • Search in either source or target sides of corpora.
  • Keep track of your preferred searches and their details.

Tailored corpora (corsets) from big corpora

  • Get smaller and custom corpora that fit your sample text.
  • Set up the details of your corset (name, topic, languages, size) and launch your search over millions of parallel sentences.

Monitor and download corsets

  • See the status of your corsets, preview them, download them, remove or share them!
  • Take a look to shared corsets to see if they are already tailored to your needs.

To know more, see:


Connecting Europe Facility

All documents and software contained in this repository reflect only the authors' view. The Innovation and Networks Executive Agency of the European Union is not responsible for any use that may be made of the information it contains.

Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

jsonpickle jsonpickle is a library for the two-way conversion of complex Python objects and JSON. jsonpickle builds upon the existing JSON encoders, s

1.1k Jan 02, 2023
Ultra fast JSON decoder and encoder written in C with Python bindings

UltraJSON UltraJSON is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 3.6+. Install with pip: $ python -m pip insta

3.9k Jan 02, 2023
Crappy tool to convert .scw files to .json and and vice versa.

SCW-JSON-TOOL Crappy tool to convert .scw files to .json and vice versa. How to use Run main.py file with two arguments: python main.py scw2json or j

Fred31 5 May 14, 2021
Extended pickling support for Python objects

cloudpickle cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library.

1.3k Jan 05, 2023
serialize all of python

dill serialize all of python About Dill dill extends python's pickle module for serializing and de-serializing python objects to the majority of the b

The UQ Foundation 1.8k Jan 07, 2023
MessagePack serializer implementation for Python msgpack.org[Python]

MessagePack for Python What's this MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JS

MessagePack 1.7k Dec 29, 2022
FlatBuffers: Memory Efficient Serialization Library

FlatBuffers FlatBuffers is a cross platform serialization library architected for maximum memory efficiency. It allows you to directly access serializ

Google 19.6k Jan 01, 2023
Python bindings for the simdjson project.

pysimdjson Python bindings for the simdjson project, a SIMD-accelerated JSON parser. If SIMD instructions are unavailable a fallback parser is used, m

Tyler Kennedy 562 Jan 08, 2023
🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)

srsly: Modern high-performance serialization utilities for Python This package bundles some of the best Python serialization libraries into one standa

Explosion 329 Dec 28, 2022
A lightweight library for converting complex objects to and from simple Python datatypes.

marshmallow: simplified object serialization marshmallow is an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, t

marshmallow-code 6.4k Jan 02, 2023
Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data. So, if you don't need the whole corpus, but just a suitable subset (indeed, a c

13 Nov 10, 2022
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

orjson orjson is a fast, correct JSON library for Python. It benchmarks as the fastest Python library for JSON and is more correct than the standard j

4.1k Dec 30, 2022
Protocol Buffers - Google's data interchange format

Protocol Buffers - Google's data interchange format Copyright 2008 Google Inc. https://developers.google.com/protocol-buffers/ Overview Protocol Buffe

Protocol Buffers 57.6k Jan 03, 2023
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON http://json.org encoder and decoder for Python 3.3+ with legacy suppo

1.5k Dec 31, 2022
Generic ASN.1 library for Python

ASN.1 library for Python This is a free and open source implementation of ASN.1 types and codecs as a Python package. It has been first written to sup

Ilya Etingof 223 Dec 11, 2022
Python wrapper around rapidjson

python-rapidjson Python wrapper around RapidJSON Authors: Ken Robbins [email prot

469 Jan 04, 2023