Introduction

This user guide is intended to provide a comprehensive guide to the use of the PyQvd library. The PyQvd library is a Python library that provides a simple interface to the QVD file format. The QVD file format is a simple file format that is used to store data in a columnar format. The QVD file format is used by QlikView and later Qlik Sense, a popular business intelligence tool, to store data.

The PyQvd library provides a simple interface to read and write QVD files. The library is designed to be easy to use and to provide a simple and intuitive interface to the QVD file format.

Reading

The most common use case for the PyQvd library is to read data from a QVD file on disk. To read data from a QVD file, you can use the static pyqvd.qvd.QvdTable.from_qvd() method of the pyqvd.qvd.QvdTable class. This method takes the path to the QVD file as an argument and returns a pyqvd.qvd.QvdTable object that represents the data in the QVD file.

For example, to read data from a QVD file called sample.qvd, you can use the following code:

from pyqvd import QvdTable

tbl = QvdTable.from_qvd("data.qvd")

Next to reading a QVD file from disk, you can also read a QVD file from any file-like object such as a binary stream. This can be done by using the pyqvd.qvd.QvdTable.from_stream() method of the pyqvd.qvd.QvdTable class. This method takes the file-like object (BinaryIO) as an argument and returns a pyqvd.qvd.QvdTable object that represents the data in the QVD file.

For example, to read data from a QVD file stored in a S3 Object Storage, you could use the following code:

import boto3
from pyqvd import QvdTable

s3 = boto3.client("s3")
obj = s3.get_object(Bucket="my-bucket", Key="sample.qvd")
tbl = QvdTable.from_stream(obj["Body"])

Sometimes your QVD files may be very large and you may not want to load the entire file into memory at once or at least you want to track the progress of the reading process. In this case, you can use the pyqvd.qvd.QvdTable.from_qvd() or pyqvd.qvd.QvdTable.from_stream() method with the optional argument chunk_size. This argument specifies the number of rows that are read from the QVD file at once. The default value is None which means that the entire file is read at once.

When you specify a chunk size, the result of the pyqvd.qvd.QvdTable.from_qvd() or pyqvd.qvd.QvdTable.from_stream() method is a iterator that yields pyqvd.qvd.QvdTable objects. Each pyqvd.qvd.QvdTable object represents a chunk of the data in the QVD file.

import tqdm
from pyqvd import QvdTable

itr = QvdTable.from_qvd("data.qvd", chunk_size=1000)

with tqdm.tqdm(total=len(itr)) as pbar:
    for tbl in itr:
        # Process the chunk
        pbar.update(1)

Important

Especially for arbitrary BinaryIO objects, it is important to note that not every stream implementation is seekable. The iteration process requires the ability to seek to any position in the stream. If the stream is not seekable, the iteration process will fail.

Data Manipulation

After reading/constructing a QVD table, you can perform various operations on the data. The pyqvd.qvd.QvdTable class provides a number of methods to help you analyze and manipulate the data in the QVD table. The following examples can only give you a brief overview of the possibilities. For a complete overview of the available methods, please refer to the API Reference documentation.

Retrevieve and Edit

First of all, you can retrieve and edit the existing data in the QVD table. For example, you can retrieve single values or slices of the data using the pyqvd.qvd.QvdTable.get() method. Instead of using the pyqvd.qvd.QvdTable.get() method, you can also use the shorthand notation, the square brackets, to retrieve single values or slices of the data.

# Retrieve the value at row 0 and column 'A'
value = tbl.get((0, "A"))

# Retrieve the value at row 0 and column 'A' using the shorthand notation
value = tbl[0, "A"]

# Retrieve the second row
row = tbl.get(1)

# Retrieve the second row using the shorthand notation
row = tbl[1]

For editing the data, you can use the pyqvd.qvd.QvdTable.set() method. This method allows you to modify individual cells, rows, or columns in the QVD table. The pyqvd.qvd.QvdTable.set() has also a shorthand notation available like the pyqvd.qvd.QvdTable.get() method.

Note

You can use symbols (QvdValue objects) as well as native Python values to update the data. Native values will be converted to QvdValue objects automatically. Unsupported native types are automatically converted to strings (StringValue).

# Update the value at row 0 and column 'A'
tbl.set((0, "A"), 42)

# Update the value at row 0 and column 'A' using a symbol
tbl.set((0, "A"), IntegerValue(42))

# Update the value at row 0 and column 'A' using the shorthand notation
tbl[0, "A"] = 42

# Replace the second row
tbl.set(1, [1, 2, 3, 4, 5])

# Replace the second row using the shorthand notation
tbl[1] = [1, 2, 3, 4, 5]

Add and Remove

In addition you can also add new rows or columns to the QVD table or remove existing rows or columns if needed. For example, to add a new row to the QVD table, you can use the pyqvd.qvd.QvdTable.append() or pyqvd.qvd.QvdTable.insert() method. The pyqvd.qvd.QvdTable.drop() method can be used to remove rows or columns from the QVD table again.

Note

You can use symbols (QvdValue objects) as well as native Python values to add new data. Native values will be converted to QvdValue objects automatically. Unsupported native types are automatically converted to strings (StringValue).

# Add a new row to the QVD table
tbl.append([1, 2, 3, 4, 5])

# Add a new row to the QVD table using symbols
tbl.append([IntegerValue(1), IntegerValue(2), IntegerValue(3), IntegerValue(4), IntegerValue(5)])

# Insert a new row at index 0
tbl.insert(0, [1, 2, 3, 4, 5])

# Remove the second row from the QVD table
tbl.drop(1)

# Remove the column 'A' from the QVD table
tbl.drop("A", axis="columns")

Filtering and Sorting

The pyqvd.qvd.QvdTable class also supports basic filtering and sorting operations. Therefor, the class provides the pyqvd.qvd.QvdTable.filter_by() and pyqvd.qvd.QvdTable.sort_by() methods to filter and sort the data based on the values of individual columns.

# Filter the data based on the values in the column 'A'
new_tbl = tbl.filter_by("A", lambda value: value.calculation_value > 0)

# Sort the data based on the values in the column 'A'
new_tbl = tbl.sort_by("A", ascending=False)

# Sorting is also possible by using a custom comparator
new_tbl = tbl.sort_by("A", comparator=lambda x, y: 1 if x.calculation_value > y.calculation_value else -1 if x.calculation_value < y.calculation_value else 0)

Concatenate and Join

The pyqvd.qvd.QvdTable class also supports concatenation and joining of multiple QVD tables. The pyqvd.qvd.QvdTable.concat() method can be used to concatenate multiple QVD tables along the rows. The pyqvd.qvd.QvdTable.join() method can be used to join multiple QVD tables along the columns.

The join method supports four different types of joins: inner, left, right, and outer. The default join type is the outer join. The join method also supports the specification of the join columns and the suffixes for the columns that are present in both tables.

# Concatenate two QVD tables
new_tbl = tbl.concat(tbl2)

# Join two QVD tables
new_tbl = tbl.join(tbl2, on="A", how="inner", lsuffix="_left", rsuffix="_right")

# Join two QVD tables using multiple columns
new_tbl = tbl.join(tbl2, on=["A", "B"], how="left", lsuffix="_left", rsuffix="_right")

Import and Export

Instead of reading the data from a QVD file, you can also import data from other in-memory sources such as a dictionary or a pandas DataFrame. This can be done by using the respective methods like pyqvd.qvd.QvdTable.from_dict() or pyqvd.qvd.QvdTable.from_pandas(). For more information, please refer to the API Reference documentation.

Note

Native Python types are automatically converted to symbols (QvdValue objects) and vice versa. Unsupported native types are automatically converted to strings (StringValue).

For example, to import a pandas DataFrame into a QVD table, you can use the following code:

import pandas as pd
from pyqvd import QvdTable

df = pd.read_csv("data.csv")
tbl = QvdTable.from_pandas(df)

Of course, you can also export the data in the QVD table to other in-memory data structures such. This can be done by using the respective methods like pyqvd.qvd.QvdTable.to_dict() or pyqvd.qvd.QvdTable.to_pandas().

For example, to export the data in the QVD table to a pandas DataFrame, you can use the following code:

df = tbl.to_pandas()

Writing

After analyzing and manipulating the data in the QVD table, you can write the data back to a QVD file on disk. To write the data to a QVD file, you can use the pyqvd.qvd.QvdTable.to_qvd() method of the pyqvd.qvd.QvdTable class. This method takes the path to the QVD file as an argument and writes the data in the QVD table to the specified file.

For example, to write the data in the QVD table to a QVD file called output.qvd, you can use the following code:

tbl.to_qvd("output.qvd")

As with reading, the QVD table or the resulting QVD file can also be written to any binary stream instead of to the hard drive. This can be done by using the pyqvd.qvd.QvdTable.to_stream() method of the pyqvd.qvd.QvdTable class. This method takes the file-like object (BinaryIO) as an argument and writes the data in the QVD table to the specified stream.

For example, to write the data in the QVD table to a binary buffer and then upload the buffer to a S3 Object Storage, you could use the following code:

import boto3
from pyqvd import QvdTable

...

buffer = io.BytesIO()
tbl.to_stream(buffer)

s3 = boto3.client("s3")
obj = s3.put_object(Bucket="my-bucket", Key="output.qvd", Body=buffer.getvalue())

By default, opinionated settings are used when writing the QVD file. However, you can also specify custom settings when writing the QVD file. For example, you can specify the table’s name or the format that is used to write the data to the QVD file. Therefore you can use the writer’s options class pyqvd.io.QvdFileWriterOptions to specify the custom settings and so called formatters to format the data.

from pyqvd.io import QvdFileWriterOptions, DateValueFormatter

date_formatter = DateValueFormatter("DD.MM.YYYY")
options = QvdFileWriterOptions(table_name="my-table", date_formatter=date_formatter)

tbl.to_qvd("output.qvd", options)

For a complete list of available options and formatters, as well as information about the supported format strings, please refer to the API Reference documentation.