Getting Started

To use the LArray library, the first thing to do is to import it

In [1]: from larray import *

Create an array

Working with the LArray library mainly consists of manipulating LArray data structures. They represent N-dimensional labelled arrays and are composed of data (numpy ndarray), axes and optionally a title. An axis contains a list of labels and may have a name (if not given, the axis is anonymous).

You can create an array from scratch by supplying data, axes and optionally a title:

# define data
In [2]: data = [[20, 22],
   ...:         [33, 31],
   ...:         [79, 81],
   ...:         [28, 34]]
   ...: 

# define axes
In [3]: age_category = Axis(["0-9", "10-17", "18-66", "67+"], "age_category")

In [4]: sex = Axis(["M", "F"], "sex")

# create LArray object
In [5]: arr = LArray(data, [age_category, sex], "population by age category and sex")

In [6]: arr
Out[6]: 
age_category\sex   M   F
             0-9  20  22
           10-17  33  31
           18-66  79  81
             67+  28  34

Note

LArray offers two syntaxes to build axes and make selections and aggregations. The first one is more Pythonic (uses Python structures) and allows to use any character in labels. The second one consists of using strings that are parsed. It is shorter to type. For example, you could create the age_category axis above as follows:

age_category = Axis("age_category=0-9,10-17,18-66,67+")

The drawback of the string syntax is that some characters such as , ; = : .. [ ] >> have a special meaning and cannot be used in labels. Strings containing only integers are interpreted as such.

In this getting started section we will use the first, more verbose, syntax which works in all cases and provide the equivalent using the shorter syntax in comments. More examples can be found in the tutorial and the API reference section.

Here are the key properties for an array:

# array summary : dimensions + description of axes
In [7]: arr.info
Out[7]: 
population by age category and sex
4 x 2
 age_category [4]: '0-9' '10-17' '18-66' '67+'
 sex [2]: 'M' 'F'
dtype: int64

# number of dimensions
In [8]: arr.ndim
Out[8]: 2

# array dimensions
In [9]: arr.shape
Out[9]: (4, 2)

# number of elements
In [10]: arr.size
Out[10]: 8

# size in memory
In [11]: arr.memory_used
Out[11]: '64 bytes'

# type of the data of the array
In [12]: arr.dtype
Out[12]: dtype('int64')

Arrays can be generated through dedicated functions:

  • zeros() : fills an array with 0
  • ones() : fills an array with 1
  • full() : fills an array with a given
  • eye() : identity matrix
  • ndrange() : fills an array with increasing numbers (mostly for testing)
  • ndtest() : same as ndrange but with axes generated automatically (for testing)
  • sequence() : creates an array by sequentially applying modifications to the array along axis.
In [13]: zeros([age_category, sex])
Out[13]: 
age_category\sex    M    F
             0-9  0.0  0.0
           10-17  0.0  0.0
           18-66  0.0  0.0
             67+  0.0  0.0

In [14]: ndtest((3, 3))
Out[14]: 
a\b  b0  b1  b2
 a0   0   1   2
 a1   3   4   5
 a2   6   7   8

Save/Load an array

The LArray library offers many I/O functions to read and write arrays in various formats (CSV, Excel, HDF5, pickle). For example, to save an array in a CSV file, call the method to_csv():

In [15]: arr_3D = ndtest((2, 2, 2))

In [16]: arr_3D
Out[16]: 
 a  b\c  c0  c1
a0   b0   0   1
a0   b1   2   3
a1   b0   4   5
a1   b1   6   7

In [17]: arr_3D.to_csv('arr_3D.csv')

Content of ‘arr_3D.csv’ file is

a,b\c,c0,c1
a0,b0,0,1
a0,b1,2,3
a1,b0,4,5
a1,b1,6,7

Note

In CSV or Excel files, the last dimension is horizontal and the names of the two last dimensions are separated by a .

To load a saved array, call the function read_csv():

In [18]: arr_3D = read_csv('arr_3D.csv')

In [19]: arr_3D
Out[19]: 
 a  b\c  c0  c1
a0   b0   0   1
a0   b1   2   3
a1   b0   4   5
a1   b1   6   7

Other input/output functions are described in the corresponding section of the API documentation.

Indexing

To select an element or a subset of an array, use brackets [ ]. Let’s start by selecting a single element:

In [20]: arr = ndtest((4, 4))

In [21]: arr
Out[21]: 
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1   4   5   6   7
 a2   8   9  10  11
 a3  12  13  14  15

In [22]: arr['a0', 'b1']
Out[22]: 1

# labels can be given in arbitrary order
In [23]: arr['b1', 'a0']
Out[23]: 1

Let’s continue with subsets:

# select subset along one axis
In [24]: arr['a0']
Out[24]: 
b  b0  b1  b2  b3
    0   1   2   3

In [25]: arr['b0']
Out[25]: 
a  a0  a1  a2  a3
    0   4   8  12

# labels associated with the same axis must be given as a list
# equivalent to: arr['a0', 'b1,b3']
In [26]: arr['a0', ['b1', 'b3']]
Out[26]: 
b  b1  b3
    1   3

Warning

Selecting by labels as above only works as long as there is no ambiguity. When several axes have common labels and you do not specify explicitly on which axis to work, it fails with an error (ValueError: … is ambiguous (valid in a, b)). Specifying the axis can be done using the special notation x.axis_name. The axis name must not contain whitespaces and special characters.

# equivalent to: arr2 = ndrange("a=label0,label1;b=label1,label2")
In [27]: arr2 = ndrange([Axis(["label0", "label1"], "a"), Axis(["label1", "label2"], "b")])

In [28]: arr2
Out[28]: 
   a\b  label1  label2
label0       0       1
label1       2       3

# equivalent to: arr2["label0", "b[label1]"]
In [29]: arr2["label0", x.b["label1"]]
Out[29]: 0

You can also define slices (defined by ‘start:stop’ or ‘start:stop:step’). A slice will select all labels between start and stop (stop included). All arguments of a slice are optional. When not given, start is the first label of an axis, stop the last one.

# "b1":"b3" is a shortcut for ["b1", "b2", "b3"]
# equivalent to: arr["a0,a2", "b1:b3"]
In [30]: arr[["a0", "a2"], "b1":"b3"]
Out[30]: 
a\b  b1  b2  b3
 a0   1   2   3
 a2   9  10  11

# :"a2" will select all labels between the first one and "a2"
# "b1": will select all labels between "b1" and the last one
# equivalent to: arr[":a2", "b1:"]
In [31]: arr[:"a2", "b1":]
Out[31]: 
a\b  b1  b2  b3
 a0   1   2   3
 a1   5   6   7
 a2   9  10  11

Aggregation

The LArray library includes many aggregations methods. For example, to calculate the sum along an axis, write:

In [32]: arr_3D
Out[32]: 
 a  b\c  c0  c1
a0   b0   0   1
a0   b1   2   3
a1   b0   4   5
a1   b1   6   7

# equivalent to: arr_3D.sum("a")
In [33]: arr_3D.sum(x.a)
Out[33]: 
b\c  c0  c1
 b0   4   6
 b1   8  10

To aggregate along all axes except one, you simply have to append _by to the aggregation method you want to use:

# equivalent to: arr_3D.sum_by("a")
In [34]: arr_3D.sum_by(x.a)
Out[34]: 
a  a0  a1
    6  22

See here to get the list of all available aggregation methods.

Groups

A Group represents a subset of labels or positions of an axis:

In [35]: arr
Out[35]: 
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1   4   5   6   7
 a2   8   9  10  11
 a3  12  13  14  15

In [36]: even = x.a["a0", "a2"]

In [37]: even
Out[37]: X.a['a0', 'a2']

In [38]: odd = x.a["a1", "a3"]

In [39]: odd
Out[39]: X.a['a1', 'a3']

They can be used in selections:

In [40]: arr[even]
Out[40]: 
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a2   8   9  10  11

In [41]: arr[odd]
Out[41]: 
a\b  b0  b1  b2  b3
 a1   4   5   6   7
 a3  12  13  14  15

or aggregations:

In [42]: arr.sum((even, odd))
Out[42]: 
  a\b  b0  b1  b2  b3
a0,a2   8  10  12  14
a1,a3  16  18  20  22

In the case of aggregations, it is often useful to attach them a name using the >> operator:

# equivalent to: arr.sum("a0,a2 >> even; a1,a3 >> odd")
In [43]: arr.sum((even >> "even", odd >> "odd"))
Out[43]: 
 a\b  b0  b1  b2  b3
even   8  10  12  14
 odd  16  18  20  22

Group arrays in Session

Arrays may be grouped in Session objects. A session is an ordered dict-like container of LArray objects with special I/O methods. To create a session, you need to pass a list of pairs (array_name, array):

In [44]: arr0 = ndtest((3, 3))

In [45]: arr1 = ndtest((2, 4))

In [46]: arr2 = ndtest((4, 2))

In [47]: arrays = [("arr0", arr0), ("arr1", arr1), ("arr2", arr2)]

In [48]: ses = Session(arrays)

# displays names of arrays contained in the session
In [49]: ses.names
Out[49]: ['arr0', 'arr1', 'arr2']

# get an array
In [50]: ses["arr0"]
Out[50]: 
a\b  b0  b1  b2
 a0   0   1   2
 a1   3   4   5
 a2   6   7   8

# add/modify an array
In [51]: ses["arr3"] = ndtest((2, 2, 2))

Warning

You can also pass a dictionary to the Session’s constructor but since elements of a dict object are not ordered by default, you may lose the order. If you are using python 3.6 or later, using keyword arguments is a nice alternative which keeps ordering. For example, the session above can be defined using: ses = Session(arr0=arr0, arr1=arr1, arr2=arr2).

One of the main interests of using sessions is to save and load many arrays at once:

In [52]: ses.save("my_session.h5")

In [53]: ses = Session("my_session.h5")

Graphical User Interface

The LArray project provides an optional package called larray-editor allowing users to explore and edit arrays using a graphical interface. This package is automatically installed with larrayenv.

To explore the content of arrays in read-only mode, import larray-editor and call view()

In [54]: from larray_editor import *

# shows the arrays of a given session in a graphical user interface
In [55]: view(ses)

# the session may be directly loaded from a file
In [56]: view("my_session.h5")

# creates a session with all existing arrays from the current namespace
# and shows its content
In [57]: view()

To open the user interface in edit mode, call edit() instead.

_images/editor.png

Once open, you can save and load any session using the File menu.

Finally, you can also visually compare two arrays or sessions using the compare() function.

In [58]: arr0 = ndtest((3, 3))

In [59]: arr1 = ndtest((3, 3))

In [60]: arr1[["a1", "a2"]] = -arr1[["a1", "a2"]]

In [61]: compare(arr0, arr1)
_images/compare.png

In case of two arrays, they must have compatible axes.

For Windows Users

Installing the larray-editor package on Windows will create a LArray menu in the Windows Start Menu. This menu contains:

  • a shortcut to open the documentation of the last stable version of the library
  • a shortcut to open the graphical interface in edit mode.
  • a shortcut to update larrayenv.
_images/menu_windows.png _images/editor_new.png

Once the graphical interface is open, all LArray objects and functions are directly accessible. No need to start by from larray import *.

Compatibility with pandas

To convert a LArray object into a pandas DataFrame, the method to_frame() can be used:

In [62]: df = arr.to_frame()

In [63]: df
Out[63]: 
b   b0  b1  b2  b3
a                 
a0   0   1   2   3
a1   4   5   6   7
a2   8   9  10  11
a3  12  13  14  15

Inversely, to convert a DataFrame into a LArray object, use the function aslarray():

In [64]: arr = aslarray(df)

In [65]: arr
Out[65]: 
a\b  b0  b1  b2  b3
 a0   0   1   2   3
 a1   4   5   6   7
 a2   8   9  10  11
 a3  12  13  14  15