API documentation for vaex library

Quick list for opening/reading in your data.

vaex.open(path, *args, **kwargs) Open a dataset from file given by path
vaex.from_arrays(**arrays) Create an in memory dataset from numpy arrays
vaex.from_csv(filename_or_buffer, **kwargs) Shortcut to read a csv file using pandas and convert to a dataset directly
vaex.from_ascii(path[, seperator, names, …]) Create an in memory dataset from an ascii file (whitespace seperated by default).
vaex.from_pandas(df[, name, copy_index, …]) Create an in memory dataset from a pandas dataframe
vaex.from_astropy_table(table)

Quick list for visualization.

vaex.dataset.Dataset.plot(*args, **kwargs)
vaex.dataset.Dataset.plot1d(*args, **kwargs)
vaex.dataset.Dataset.scatter(*args, **kwargs)
vaex.dataset.Dataset.plot_widget(x, y[, z, …])
vaex.dataset.Dataset.healpix_plot([…])
param healpix_expression:
 {healpix_max_level}

Quick list for statistics.

vaex.dataset.Dataset.count([expression, …]) Count the number of non-NaN values (or all, if expression is None or “*”)
vaex.dataset.Dataset.mean(expression[, …]) Calculate the mean for expression, possibly on a grid defined by binby.
vaex.dataset.Dataset.std(expression[, …]) Calculate the standard deviation for the given expression, possible on a grid defined by binby
vaex.dataset.Dataset.var(expression[, …]) Calculate the sample variance for the given expression, possible on a grid defined by binby
vaex.dataset.Dataset.cov(x[, y, binby, …]) Calculate the covariance matrix for x and y or more expressions, possible on a grid defined by binby
vaex.dataset.Dataset.correlation(x[, y, …]) Calculate the correlation coefficient cov[x,y]/(std[x]*std[y]) between and x and y, possible on a grid defined by binby
vaex.dataset.Dataset.median_approx(expression) Calculate the median , possible on a grid defined by binby
vaex.dataset.Dataset.mode(expression[, …])
vaex.dataset.Dataset.min(expression[, …]) Calculate the minimum for given expressions, possible on a grid defined by binby
vaex.dataset.Dataset.max(expression[, …]) Calculate the maximum for given expressions, possible on a grid defined by binby
vaex.dataset.Dataset.minmax(expression[, …]) Calculate the minimum and maximum for expressions, possible on a grid defined by binby
vaex.dataset.Dataset.mutual_information(x[, …]) Estimate the mutual information between and x and y on a grid with shape mi_shape and mi_limits, possible on a grid defined by binby

Vaex is a library for dealing with big tabular data.

The most important class (datastructure) in vaex is the Dataset. A dataset is obtained by either, opening the example dataset:

>>> import vaex
>>> t = vaex.example()

Or using open() or from_csv(), to open a file:

>>> t1 = vaex.open("somedata.hdf5")
>>> t2 = vaex.open("somedata.fits")
>>> t3 = vaex.from_csv("somedata.csv")

Or connecting to a remove server:

>>> tbig = vaex.open("http://bla.com/bigtable")

The main purpose of vaex is to provide statistics, such as mean, count, sum, standard deviation, per columns, possibly with a selection, and on a regular grid.

To count the number of rows:

>>> t = vaex.example()
>>> t.count()
330000.0

Or the number of valid values, which for this dataset is the same:

>>> t.count("x")
330000.0

Count them on a regular grid:

>>> t.count("x", binby=["x", "y"], shape=(4,4))
array([[   902.,   5893.,   5780.,   1193.],
       [  4097.,  71445.,  75916.,   4560.],
       [  4743.,  71131.,  65560.,   4108.],
       [  1115.,   6578.,   4382.,    821.]])

Visualise it using matplotlib:

>>> t.plot("x", "y", show=True)
<matplotlib.image.AxesImage at 0x1165a5090>
vaex.open(path, *args, **kwargs)[source]

Open a dataset from file given by path

Parameters:
  • path (str) – local or absolute path to file
  • args – extra arguments for file readers that need it
  • kwargs – extra keyword arguments
Returns:

return dataset if file is supported, otherwise None

Return type:

Dataset

Example:
>>> import vaex as vx
>>> vx.open('myfile.hdf5')
<vaex.dataset.Hdf5MemoryMapped at 0x1136ee3d0>
>>> vx.open('gadget_file.hdf5', 3) # this will read only particle type 3
<vaex.dataset.Hdf5MemoryMappedGadget at 0x1136ef3d0>
vaex.from_arrays(**arrays)[source]

Create an in memory dataset from numpy arrays

Param:arrays: keyword arguments with arrays
Example:
>>> x = np.arange(10)
>>> y = x ** 2
>>> dataset = vx.from_arrays(x=x, y=y)
vaex.from_csv(filename_or_buffer, **kwargs)[source]

Shortcut to read a csv file using pandas and convert to a dataset directly

vaex.from_ascii(path, seperator=None, names=True, skip_lines=0, skip_after=0, **kwargs)[source]

Create an in memory dataset from an ascii file (whitespace seperated by default).

>>> ds = vx.from_ascii("table.asc")
>>> ds = vx.from_ascii("table.csv", seperator=",", names=["x", "y", "z"])
Parameters:
  • path – file path
  • seperator – value seperator, by default whitespace, use “,” for comma seperated values.
  • names – If True, the first line is used for the column names, otherwise provide a list of strings with names
  • skip_lines – skip lines at the start of the file
  • skip_after – skip lines at the end of the file
  • kwargs
Returns:

vaex.from_pandas(df, name='pandas', copy_index=True, index_name='index')[source]

Create an in memory dataset from a pandas dataframe

Param:pandas.DataFrame df: Pandas dataframe
Param:name: unique for the dataset
>>> import pandas as pd
>>> df = pd.from_csv("test.csv")
>>> ds = vx.from_pandas(df, name="test")
vaex.from_astropy_table(table)[source]
vaex.from_samp(username=None, password=None)[source]

Connect to a SAMP Hub and wait for a single table load event, disconnect, download the table and return the dataset

Useful if you want to send a single table from say TOPCAT to vaex in a python console or notebook

vaex.open_many(filenames)[source]

Open a list of filenames, and return a dataset with all datasets cocatenated

Parameters:filenames (list[str]) – list of filenames/paths
Return type:Dataset
vaex.server(url, **kwargs)[source]

Connect to hostname supporting the vaex web api

Parameters:hostname (str) – hostname or ip address of server
Return vaex.dataset.ServerRest:
 returns a server object, note that it does not connect to the server yet, so this will always succeed
Return type:ServerRest
vaex.example(download=True)[source]

Returns an example dataset which comes with vaex for testing/learning purposes

Return type:vaex.dataset.Dataset
vaex.app(*args, **kwargs)[source]

Create a vaex app, the QApplication mainloop must be started.

In ipython notebook/jupyter do the following: import vaex.ui.main # this causes the qt api level to be set properly import vaex as xs Next cell: %gui qt Next cell app = vx.app()

From now on, you can run the app along with jupyter

vaex.zeldovich(dim=2, N=256, n=-2.5, t=None, scale=1, seed=None)[source]

Creates a zeldovich dataset

vaex.set_log_level_debug()[source]

set log level to debug

vaex.set_log_level_info()[source]

set log level to info

vaex.set_log_level_warning()[source]

set log level to warning

vaex.set_log_level_exception()[source]

set log level to exception

vaex.set_log_level_off()[source]

Disabled logging

vaex.delayed(f)[source]
class vaex.stat.Expression[source]

Describes an expression for a statistic

calculate(ds, binby=[], shape=256, limits=None, selection=None)[source]

Calculate the statistic for a Dataset

vaex.stat.correlation(x, y)[source]

Creates a standard deviation statistic

vaex.stat.count(expression='*')[source]

Creates a count statistic

vaex.stat.covar(x, y)[source]

Creates a standard deviation statistic

vaex.stat.mean(expression)[source]

Creates a mean statistic

vaex.stat.std(expression)[source]

Creates a standard deviation statistic

vaex.stat.sum(expression)[source]

Creates a sum statistic