pq

This feature requires jaqy-avro plugin. It exports Apache Parquet files.

Unfortunately, due to the fact that parquet has hard code dependency on hadoop libraries, the size of this plugin is ~20MB.

Options

-b,--blocksize <arg>     sets the row group / block size
-c,--compression <arg>   sets the compression codec
-d,--padding <arg>       sets the maximum padding size
-p,--pagesize <arg>      sets the page size
-r,--rowcount <arg>      sets the row count limit

Note

  • For --pagesize and --blocksize, it is possible to use mb and gb suffixes to specify the size. For instance, 1mb would be 1 * 1024 * 1024 bytes.

Supported Compression Codecs

Compression extension
brotli .br
gzip .gz
lz4 .lz4
lzo .lzo
snappy .snappy
zstd .zstd
  • It is possible to specify the compression codec implicitly by using the corresponding file extension in the file name.
  • LZ4 compression requires the native hadoop installation. This is one of the things hard coded by the Apache Parquet library.
  • LZO compression requires a separate library due to its GPL license. Please see https://github.com/twitter/hadoop-lzo for the build instruction.

Database Type to AVRO Type Mapping

Database Type AVRO Type
BOOLEAN BOOLEAN
TINYINT SMALLINT INTEGER INTEGER
BIGINT LONG
FLOAT FLOAT
DOUBLE DOUBLE
ARRAY ARRAY
BINARY VARBINARY LONGVARBINARY BLOB BYTES
DECIMAL NUMERIC REAL CHAR VARCHAR CLOB STRING

Note

  • DECIMAL is converted to string to preserve the precision.

  • Array is in general treated as array of string types. The primary reason is that there is no way to get the array element type in JDBC.

    • For PostgreSQL, because such information can be easily guessed, it is supported for some well known types.
  • Struct is exported as array of string types.

    • For Teradata, PERIOD data types, which are transmitted as Struct types, are converted into formats that matches their BTEQ output formats.
    • For PostgreSQL, the driver reports Struct type even though the data is actually string. Jaqy had a specific workaround for this inconsistency.
  • For types not listed in the above table, they are stored as STRING. AVRO exporter relies on the toString() function of the object retrieved by the JDBC driver to obtain the output. There is no guarantee such String representations can be used for import.

  • For the explanation of page size, row group / block size, etc, see Apache Parquet.

Example

-- use snappy compression implicitly
.export pq myfile.parquet.snappy
SELECT * FROM MyTable ORDER BY a;

See Also