Tutorial¶
This is an introduction to LArray. It is not intended to be a fully comprehensive manual. It is mainly dedicated to help new users to familiarize with it and others to remind essentials.
The first step to use the LArray library is to import it:
In [1]: from larray import *
Axis creation¶
An axis represents a dimension of an LArray object.
It consists of a name and a list of labels. They are several ways to create an axis:
# create a wildcard axis
In [2]: age = Axis(3, 'age')
# labels given as a list
In [3]: time = Axis([2007, 2008, 2009], 'time')
# create an axis using one string
In [4]: sex = Axis('sex=M,F')
# labels generated using a special syntax
In [5]: other = Axis('other=A01..C03')
In [6]: age, sex, time, other
Out[6]:
(Axis(3, 'age'),
Axis(['M', 'F'], 'sex'),
Axis([2007, 2008, 2009], 'time'),
Axis(['A01', 'A02', 'A03', 'B01', 'B02', 'B03', 'C01', 'C02', 'C03'], 'other'))
Array creation¶
A LArray object represents a multidimensional array with labeled axes.
From scratch¶
To create an array from scratch, you need to provide the data and a list of axes. Optionally, a title can be defined.
In [7]: import numpy as np
# list of the axes
In [8]: axes = [age, sex, time, other]
# data (the shape of data array must match axes lengths)
In [9]: data = np.random.randint(100, size=[len(axis) for axis in axes])
# title (optional)
In [10]: title = 'random data'
In [11]: arr = LArray(data, axes, title)
In [12]: arr
Out[12]:
age* sex time\other A01 A02 A03 B01 B02 B03 C01 C02 C03
0 M 2007 11 26 93 57 75 82 45 2 13
0 M 2008 55 9 50 76 33 91 38 25 69
0 M 2009 53 28 72 40 30 69 35 24 7
0 F 2007 38 60 34 41 23 51 55 83 78
0 F 2008 23 81 60 31 88 78 85 16 43
0 F 2009 69 63 92 48 34 69 51 61 18
1 M 2007 17 37 86 9 3 58 88 9 41
1 M 2008 15 4 74 47 23 81 8 25 15
1 M 2009 36 30 63 27 35 34 93 31 86
1 F 2007 50 18 86 95 79 46 76 11 39
1 F 2008 2 99 11 97 24 18 1 27 23
1 F 2009 70 59 13 82 80 6 39 6 88
2 M 2007 61 74 81 57 34 66 9 42 79
2 M 2008 23 67 29 58 77 43 29 74 49
2 M 2009 85 46 44 4 74 17 64 34 95
2 F 2007 18 98 29 62 86 43 19 12 65
2 F 2008 28 87 32 91 22 51 40 4 38
2 F 2009 41 76 97 97 5 3 45 58 40
Array creation functions¶
Arrays can also be generated in an easier way through creation functions:
ndrange(): fills an array with increasing numbersndtest(): same as ndrange but with axes generated automatically (for testing)empty(): creates an array but leaves its allocated memory unchanged (i.e., it contains “garbage”. Be careful !)zeros(): fills an array with 0ones(): fills an array with 1full(): fills an array with a given value
Except for ndtest, a list of axes must be provided. Axes can be passed in different ways:
- as Axis objects
- as integers defining the lengths of auto-generated wildcard axes
- as a string : ‘sex=M,F;time=2007,2008,2009’ (name is optional)
- as pairs (name, labels)
Optionally, the type of data stored by the array can be specified using argument dtype.
# start defines the starting value of data
In [13]: ndrange(['age=0..2', 'sex=M,F', 'time=2007..2009'], start=-1)
Out[13]:
age sex\time 2007 2008 2009
0 M -1 0 1
0 F 2 3 4
1 M 5 6 7
1 F 8 9 10
2 M 11 12 13
2 F 14 15 16
# start defines the starting value of data
# label_start defines the starting index of labels
In [14]: ndtest((3, 3), start=-1, label_start=2)
Out[14]:
a\b b2 b3 b4
a2 -1 0 1
a3 2 3 4
a4 5 6 7
# empty generates uninitialised array with correct axes (much faster but use with care!).
# This not really random either, it just reuses a portion of memory that is available, with whatever content is there.
# Use it only if performance matters and make sure all data will be overridden.
In [15]: empty(['age=0..2', 'sex=M,F', 'time=2007..2009'])
Out[15]:
age sex\time 2007 2008 2009
0 M 6.9180901376223e-310 2.84317546e-316 6.91809100536483e-310
0 F 6.91806053025905e-310 6.91806051988683e-310 0.0
1 M 0.0 6.91809100536483e-310 0.0
1 F 0.0 6.9180660075894e-310 6.91809100536483e-310
2 M 0.0 0.0 6.91809100536483e-310
2 F 6.9180909566974e-310 2.865238098441256e+161 3.250037186622356e+178
# example with anonymous axes
In [16]: zeros(['0..2', 'M,F', '2007..2009'])
Out[16]:
{0} {1}\{2} 2007 2008 2009
0 M 0.0 0.0 0.0
0 F 0.0 0.0 0.0
1 M 0.0 0.0 0.0
1 F 0.0 0.0 0.0
2 M 0.0 0.0 0.0
2 F 0.0 0.0 0.0
# dtype=int forces to store int data instead of default float
In [17]: ones(['age=0..2', 'sex=M,F', 'time=2007..2009'], dtype=int)
Out[17]:
age sex\time 2007 2008 2009
0 M 1 1 1
0 F 1 1 1
1 M 1 1 1
1 F 1 1 1
2 M 1 1 1
2 F 1 1 1
In [18]: full(['age=0..2', 'sex=M,F', 'time=2007..2009'], 1.23)
Out[18]:
age sex\time 2007 2008 2009
0 M 1.23 1.23 1.23
0 F 1.23 1.23 1.23
1 M 1.23 1.23 1.23
1 F 1.23 1.23 1.23
2 M 1.23 1.23 1.23
2 F 1.23 1.23 1.23
All the above functions exist in {func}_like variants which take axes from another array
In [19]: ones_like(arr)
Out[19]:
age* sex time\other A01 A02 A03 B01 B02 B03 C01 C02 C03
0 M 2007 1 1 1 1 1 1 1 1 1
0 M 2008 1 1 1 1 1 1 1 1 1
0 M 2009 1 1 1 1 1 1 1 1 1
0 F 2007 1 1 1 1 1 1 1 1 1
0 F 2008 1 1 1 1 1 1 1 1 1
0 F 2009 1 1 1 1 1 1 1 1 1
1 M 2007 1 1 1 1 1 1 1 1 1
1 M 2008 1 1 1 1 1 1 1 1 1
1 M 2009 1 1 1 1 1 1 1 1 1
1 F 2007 1 1 1 1 1 1 1 1 1
1 F 2008 1 1 1 1 1 1 1 1 1
1 F 2009 1 1 1 1 1 1 1 1 1
2 M 2007 1 1 1 1 1 1 1 1 1
2 M 2008 1 1 1 1 1 1 1 1 1
2 M 2009 1 1 1 1 1 1 1 1 1
2 F 2007 1 1 1 1 1 1 1 1 1
2 F 2008 1 1 1 1 1 1 1 1 1
2 F 2009 1 1 1 1 1 1 1 1 1
Sequence¶
The special sequence() function allows you to create an array from an
axis by iteratively applying a function to a given initial value. You
can choose between inc and mult functions or define your own.
# With initial=1.0 and inc=0.5, we generate the sequence 1.0, 1.5, 2.0, 2.5, 3.0, ...
In [20]: sequence('sex=M,F', initial=1.0, inc=0.5)
Out[20]:
sex M F
1.0 1.5
# With initial=1.0 and mult=2.0, we generate the sequence 1.0, 2.0, 4.0, 8.0, ...
In [21]: sequence('age=0..2', initial=1.0, mult=2.0)
Out[21]:
age 0 1 2
1.0 2.0 4.0
# Using your own function
In [22]: sequence('time=2007..2009', initial=2.0, func=lambda value: value**2)
Out[22]:
time 2007 2008 2009
2.0 4.0 16.0
You can also create N-dimensional array by passing (N-1)-dimensional array to initial, inc or mult argument
In [23]: birth = LArray([1.05, 1.15], 'sex=M,F')
In [24]: cumulate_newborns = sequence('time=2007..2009', initial=0.0, inc=birth)
In [25]: cumulate_newborns
Out[25]:
sex\time 2007 2008 2009
M 0.0 1.05 2.1
F 0.0 1.15 2.3
In [26]: initial = LArray([90, 100], 'sex=M,F')
In [27]: survival = LArray([0.96, 0.98], 'sex=M,F')
In [28]: pop = sequence('age=80..83', initial=initial, mult=survival)
In [29]: pop
Out[29]:
sex\age 80 81 82 83
M 90.0 86.39999999999999 82.944 79.62624
F 100.0 98.0 96.03999999999999 94.11919999999999
Load/Dump from files¶
Load from files¶
In [30]: example_dir = EXAMPLE_FILES_DIR
Arrays can be loaded from CSV files (see documentation of read_csv()
for more details)
# read_tsv is a shortcut when data are separated by tabs instead of commas (default separator of read_csv)
# read_eurostat is a shortcut to read EUROSTAT TSV files
In [31]: household = read_csv(example_dir + 'hh.csv')
In [32]: household.info
Out[32]:
26 x 3 x 7
time [26]: 1991 1992 1993 ... 2014 2015 2016
geo [3]: 'BruCap' 'Fla' 'Wal'
hh_type [7]: 'SING' "'MAR0" 'MAR+' ... 'UNM+' 'H1P' 'OTHR'
dtype: int64
or Excel sheets (see documentation of read_excel() for more details)
# loads array from the first sheet if no sheetname is given
In [33]: pop = read_excel(example_dir + 'demography.xlsx', 'pop')
In [34]: pop.info
Out[34]:
26 x 3 x 121 x 2 x 2
time [26]: 1991 1992 1993 ... 2014 2015 2016
geo [3]: 'BruCap' 'Fla' 'Wal'
age [121]: 0 1 2 ... 118 119 120
sex [2]: 'M' 'F'
nat [2]: 'BE' 'FO'
dtype: int64
or HDF5 files (HDF5 is file format designed to store and organize large
amounts of data. An HDF5 file can contain multiple arrays. See
documentation of read_hdf() for more details)
In [35]: mortality = read_hdf(example_dir + 'demography.h5','qx')
In [36]: mortality.info
Out[36]:
26 x 3 x 121 x 2 x 2
time [26]: 1991 1992 1993 ... 2014 2015 2016
geo [3]: 'BruCap' 'Fla' 'Wal'
age [121]: 0 1 2 ... 118 119 120
sex [2]: 'M' 'F'
nat [2]: 'BE' 'FO'
dtype: float64
Dump in files¶
Arrays can be dumped in CSV files (see documentation of to_csv() for
more details)
In [37]: household.to_csv('hh2.csv')
or in Excel files (see documentation of to_excel() for more details)
# if the file does not already exist, it is created with a single sheet,
# otherwise a new sheet is added to it
In [38]: household.to_excel('demography_2.xlsx', overwrite_file=True)
# it is usually better to specify the sheet explicitly (by name or position) though
In [39]: household.to_excel('demography_2.xlsx', 'hh')
or in HDF5 files (see documentation of to_hdf() for more details)
In [40]: household.to_hdf('demography_2.h5', 'hh')
more Excel IO¶
# create a 3 x 2 x 3 array
In [41]: age, sex, time = Axis('age=0..2'), Axis('sex=M,F'), Axis('time=2007..2009')
In [42]: arr = ndrange([age, sex, time])
In [43]: arr
Out[43]:
age sex\time 2007 2008 2009
0 M 0 1 2
0 F 3 4 5
1 M 6 7 8
1 F 9 10 11
2 M 12 13 14
2 F 15 16 17
Write Arrays¶
Open an Excel file
In [44]: wb = open_excel('test.xlsx', overwrite_file=True)
Put an array in an Excel Sheet, excluding headers (labels)
# put arr at A1 in Sheet1, excluding headers (labels)
In [45]: wb['Sheet1'] = arr
# same but starting at A9
# note that Sheet1 must exist
In [46]: wb['Sheet1']['A9'] = arr
Put an array in an Excel Sheet, including headers (labels)
# dump arr at A1 in Sheet2, including headers (labels)
In [47]: wb['Sheet2'] = arr.dump()
# same but starting at A10
In [48]: wb['Sheet2']['A10'] = arr.dump()
Save file to disk
In [49]: wb.save()
Close file
In [50]: wb.close()
Read Arrays¶
Open an Excel file
In [51]: wb = open_excel('test.xlsx')
Load an array from a sheet (assuming the presence of (correctly formatted) headers and only one array in sheet)
# save one array in Sheet3 (including headers)
In [52]: wb['Sheet3'] = arr.dump()
# load array from the data starting at A1 in Sheet3
In [53]: arr = wb['Sheet3'].load()
In [54]: arr
Out[54]:
age sex\time 2007 2008 2009
0 M 0 1 2
0 F 3 4 5
1 M 6 7 8
1 F 9 10 11
2 M 12 13 14
2 F 15 16 17
Load an array with its axes information from a range
# if you need to use the same sheet several times,
# you can create a sheet variable
In [55]: sheet2 = wb['Sheet2']
# load array contained in the 4 x 4 table defined by cells A10 and D14
In [56]: arr2 = sheet2['A10:D14'].load()
In [57]: arr2
Out[57]:
age sex\time 2007 2008
0 M 0 1
0 F 3 4
1 M 6 7
1 F 9 10
Read Ranges (experimental)¶
Load an array (raw data) with no axis information from a range
In [58]: arr3 = wb['Sheet1']['A1:B4']
In [59]: arr3
Out[59]:
{0}*\{1}* 0 1
0 0 1
1 3 4
2 6 7
3 9 10
in fact, this is not really an LArray …
In [60]: type(arr3)
larray.io.excel.Range
… but it can be used as such
In [61]: arr3.sum(axis=0)
Out[61]:
{0}* 0 1
18 22
… and it can be used for other stuff, like setting the formula instead of the value:
In [62]: arr3.formula = '=D10+1'
In the future, we should also be able to set font name, size, style, etc.
In [63]: wb.close()
Inspecting¶
# load population array
In [64]: pop = load_example_data('demography').pop
Get array summary : dimensions + description of axes
In [65]: pop.info
Out[65]:
26 x 3 x 121 x 2 x 2
time [26]: 1991 1992 1993 ... 2014 2015 2016
geo [3]: 'BruCap' 'Fla' 'Wal'
age [121]: 0 1 2 ... 118 119 120
sex [2]: 'M' 'F'
nat [2]: 'BE' 'FO'
dtype: int64
Get axes
In [66]: time, geo, age, sex, nat = pop.axes
Get array dimensions
In [67]: pop.shape
Out[67]: (26, 3, 121, 2, 2)
Get number of elements
In [68]: pop.size
Out[68]: 37752
Get size in memory
In [69]: pop.nbytes
Out[69]: 302016
Start viewer (graphical user interface) in read-only mode. This will open a new window and block execution of the rest of code until the windows is closed! Required PyQt installed.
In [70]: view(pop)
Load array in an Excel sheet
In [71]: pop.to_excel()
Selection (Subsets)¶
LArray allows to select a subset of an array either by labels or positions
Selection by Labels¶
To take a subset of an array using labels, use brackets [ ]. Let’s start by selecting a single element:
# here we select the value associated with Belgian women of age 50 from Brussels region for the year 2015
In [72]: pop[2015, 'BruCap', 50, 'F', 'BE']
Out[72]: 4813
Continue with selecting a subset using slices and lists of labels
# here we select the subset associated with Belgian women of age 50, 51 and 52
# from Brussels region for the years 2010 to 2016
In [73]: pop[2010:2016, 'BruCap', 50:52, 'F', 'BE']
Out[73]:
time\age 50 51 52
2010 4869 4811 4699
2011 5015 4860 4792
2012 4722 5014 4818
2013 4711 4727 5007
2014 4788 4702 4730
2015 4813 4767 4676
2016 4814 4792 4740
# slices bounds are optional:
# if not given start is assumed to be the first label and stop is the last one.
# Here we select all years starting from 2010
In [74]: pop[2010:, 'BruCap', 50:52, 'F', 'BE']
Out[74]:
time\age 50 51 52
2010 4869 4811 4699
2011 5015 4860 4792
2012 4722 5014 4818
2013 4711 4727 5007
2014 4788 4702 4730
2015 4813 4767 4676
2016 4814 4792 4740
# Slices can also have a step (defaults to 1), to take every Nth labels
# Here we select all even years starting from 2010
In [75]: pop[2010::2, 'BruCap', 50:52, 'F', 'BE']
Out[75]:
time\age 50 51 52
2010 4869 4811 4699
2012 4722 5014 4818
2014 4788 4702 4730
2016 4814 4792 4740
# one can also use list of labels to take non-contiguous labels.
# Here we select years 2008, 2010, 2013 and 2015
In [76]: pop[[2008, 2010, 2013, 2015], 'BruCap', 50:52, 'F', 'BE']
Out[76]:
time\age 50 51 52
2008 4731 4735 4724
2010 4869 4811 4699
2013 4711 4727 5007
2015 4813 4767 4676
The order of indexing does not matter either, so you usually do not care/have to remember about axes positions during computation. It only matters for output.
# order of index doesn't matter
In [77]: pop['F', 'BE', 'BruCap', [2008, 2010, 2013, 2015], 50:52]
Out[77]:
time\age 50 51 52
2008 4731 4735 4724
2010 4869 4811 4699
2013 4711 4727 5007
2015 4813 4767 4676
Warning
Selecting by labels as above works well as long as there is no ambiguity. When two or more axes have common labels, it may lead to a crash. The solution is then to precise to which axis belong the labels.
# let us now create an array with the same labels on several axes
In [78]: age, weight, size = Axis('age=0..80'), Axis('weight=0..120'), Axis('size=0..200')
In [79]: arr_ws = ndrange([age, weight, size])
# let's try to select teenagers with size between 1 m 60 and 1 m 65 and weight > 80 kg.
# In this case the subset is ambiguous and this results in an error:
In [80]: arr_ws[10:18, :80, 160:165]
<class 'ValueError'> slice(10, 18, None) is ambiguous (valid in age, weight, size)
# the solution is simple. You need to precise the axes on which you make a selection
In [81]: arr_ws[age[10:18], weight[:80], size[160:165]]
Out[81]:
age weight\size 160 161 162 163 164 165
10 0 243370 243371 243372 243373 243374 243375
10 1 243571 243572 243573 243574 243575 243576
10 2 243772 243773 243774 243775 243776 243777
10 3 243973 243974 243975 243976 243977 243978
10 4 244174 244175 244176 244177 244178 244179
... ... ... ... ... ... ... ...
18 76 453214 453215 453216 453217 453218 453219
18 77 453415 453416 453417 453418 453419 453420
18 78 453616 453617 453618 453619 453620 453621
18 79 453817 453818 453819 453820 453821 453822
18 80 454018 454019 454020 454021 454022 454023
Special variable x¶
When selecting, assiging or using aggregate functions, an axis can be
refered via the special variable x:
- pop[x.age[:20]]
- pop.sum(x.age)
This gives you acces to axes of the array you are manipulating. The main drawback of using x is that you lose the autocompletion available from many editors. It only works with non-wildcard axes.
# the previous example could have been also written as
In [82]: arr_ws[x.age[10:18], x.weight[:80], x.size[160:165]]
Out[82]:
age weight\size 160 161 162 163 164 165
10 0 243370 243371 243372 243373 243374 243375
10 1 243571 243572 243573 243574 243575 243576
10 2 243772 243773 243774 243775 243776 243777
10 3 243973 243974 243975 243976 243977 243978
10 4 244174 244175 244176 244177 244178 244179
... ... ... ... ... ... ... ...
18 76 453214 453215 453216 453217 453218 453219
18 77 453415 453416 453417 453418 453419 453420
18 78 453616 453617 453618 453619 453620 453621
18 79 453817 453818 453819 453820 453821 453822
18 80 454018 454019 454020 454021 454022 454023
Selection by Positions¶
Sometimes it is more practical to use positions along the axis, instead
of labels. You need to add the character i before the brackets:
.i[positions]. As for selection with labels, you can use single
position or slice or list of positions. Positions can be also negative
(-1 represent the last element of an axis).
Note
Remember that positions (indices) are always 0-based in Python. So the first element is at position 0, the second is at position 1, etc.
# here we select the subset associated with Belgian women of age 50, 51 and 52
# from Brussels region for the first 3 years
In [83]: pop[x.time.i[:3], 'BruCap', 50:52, 'F', 'BE']
Out[83]:
time\age 50 51 52
1991 3739 4138 4101
1992 3373 3665 4088
1993 3648 3335 3615
# same but for the last 3 years
In [84]: pop[x.time.i[-3:], 'BruCap', 50:52, 'F', 'BE']
Out[84]:
time\age 50 51 52
2014 4788 4702 4730
2015 4813 4767 4676
2016 4814 4792 4740
# using list of positions
In [85]: pop[x.time.i[-9,-7,-4,-2], 'BruCap', 50:52, 'F', 'BE']
Out[85]:
time\age 50 51 52
2008 4731 4735 4724
2010 4869 4811 4699
2013 4711 4727 5007
2015 4813 4767 4676
Warning
The end indice (position) is EXCLUSIVE while the end label is INCLUSIVE.
# with labels (3 is included)
In [86]: pop[2015, 'BruCap', x.age[:3], 'F', 'BE']
Out[86]:
age 0 1 2 3
6020 5882 6023 5861
# with position (3 is out)
In [87]: pop[2015, 'BruCap', x.age.i[:3], 'F', 'BE']
Out[87]:
age 0 1 2
6020 5882 6023
You can use .i[] selection directly on array instead of axes. In this
context, if you want to select a subset of the first and third axes for
example, you must use a full slice : for the second one.
# here we select the last year and first 3 ages
# equivalent to: pop.i[-1, :, :3, :, :]
In [88]: pop.i[-1, :, :3]
Out[88]:
geo age sex\nat BE FO
BruCap 0 M 6155 3104
BruCap 0 F 5900 2817
BruCap 1 M 6165 3068
BruCap 1 F 5916 2946
BruCap 2 M 6053 2918
BruCap 2 F 5736 2776
Fla 0 M 29993 3717
Fla 0 F 28483 3587
Fla 1 M 31292 3716
Fla 1 F 29721 3575
Fla 2 M 31718 3597
Fla 2 F 30353 3387
Wal 0 M 17869 1472
Wal 0 F 17242 1454
Wal 1 M 18820 1432
Wal 1 F 17604 1443
Wal 2 M 19076 1444
Wal 2 F 18189 1358
Assigning subsets¶
Assigning value¶
Assign a value to a subset
# let's take a smaller array
In [89]: pop = load_example_data('demography').pop[2016, 'BruCap', 100:105]
In [90]: pop2 = pop
In [91]: pop2
Out[91]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 8 0
102 F 26 1
103 M 2 1
103 F 17 2
104 M 2 1
104 F 14 0
105 M 0 0
105 F 2 2
# set all data corresponding to age >= 102 to 0
In [92]: pop2[102:] = 0
In [93]: pop2
Out[93]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 0 0
102 F 0 0
103 M 0 0
103 F 0 0
104 M 0 0
104 F 0 0
105 M 0 0
105 F 0 0
One very important gotcha though…
Warning
Modifying a slice of an array in-place like we did above should be done with
care otherwise you could have unexpected effects.
The reason is that taking a slice subset of an array does not return a copy
of that array, but rather a view on that array.
To avoid such behavior, use .copy() method.
Remember:
- taking a slice subset of an array is extremely fast (no data is copied)
- if one modifies that subset in-place, one also modifies the original array
- .copy() returns a copy of the subset (takes speed and memory) but allows you to change the subset without modifying the original array in the same time
# indeed, data from the original array have also changed
In [94]: pop
Out[94]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 0 0
102 F 0 0
103 M 0 0
103 F 0 0
104 M 0 0
104 F 0 0
105 M 0 0
105 F 0 0
# the right way
In [95]: pop = load_example_data('demography').pop[2016, 'BruCap', 100:105]
In [96]: pop2 = pop.copy()
In [97]: pop2[102:] = 0
In [98]: pop2
Out[98]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 0 0
102 F 0 0
103 M 0 0
103 F 0 0
104 M 0 0
104 F 0 0
105 M 0 0
105 F 0 0
# now, data from the original array have not changed this time
In [99]: pop
Out[99]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 8 0
102 F 26 1
103 M 2 1
103 F 17 2
104 M 2 1
104 F 14 0
105 M 0 0
105 F 2 2
Assigning Arrays & Broadcasting¶
Instead of a value, we can also assign an array to a subset. In that case, that array can have less axes than the target but those which are present must be compatible with the subset being targeted.
In [100]: sex, nat = Axis('sex=M,F'), Axis('nat=BE,FO')
In [101]: new_value = LArray([[1, -1], [2, -2]],[sex, nat])
In [102]: new_value
Out[102]:
sex\nat BE FO
M 1 -1
F 2 -2
# this assigns 1, -1 to Belgian, Foreigner men
# and 2, -2 to Belgian, Foreigner women for all
# people older than 100
In [103]: pop[102:] = new_value
In [104]: pop
Out[104]:
age sex\nat BE FO
100 M 12 0
100 F 60 3
101 M 12 2
101 F 66 5
102 M 1 -1
102 F 2 -2
103 M 1 -1
103 F 2 -2
104 M 1 -1
104 F 2 -2
105 M 1 -1
105 F 2 -2
Warning
The array being assigned must have compatible axes with the target subset.
# assume we define the following array with shape 3 x 2 x 2
In [105]: new_value = zeros(['age=0..2', sex, nat])
In [106]: new_value
Out[106]:
age sex\nat BE FO
0 M 0.0 0.0
0 F 0.0 0.0
1 M 0.0 0.0
1 F 0.0 0.0
2 M 0.0 0.0
2 F 0.0 0.0
# now let's try to assign the previous array in a subset with shape 7 x 2 x 2
In [107]: pop[102:] = new_value
<class 'ValueError'> could not broadcast input array from shape (3,2,2) into shape (4,2,2)
# but this works
In [108]: pop[102:104] = new_value
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-108-182e038e029c> in <module>()
----> 1 pop[102:104] = new_value
~/checkouts/readthedocs.org/user_builds/larray-test/conda/stable/lib/python3.5/site-packages/larray-0.27-py3.5.egg/larray/core/array.py in __setitem__(self, key, value, collapse_slices)
2168 axes = self._get_axes_from_translated_key(translated_key)
2169 value = value.broadcast_with(axes)
-> 2170 value.axes.check_compatible(axes)
2171
2172 # replace incomprehensible error message "could not broadcast input array from shape XX into shape YY"
~/checkouts/readthedocs.org/user_builds/larray-test/conda/stable/lib/python3.5/site-packages/larray-0.27-py3.5.egg/larray/core/axis.py in check_compatible(self, axes)
1713 local_axis = self.get_by_pos(axis, i)
1714 if not local_axis.iscompatible(axis):
-> 1715 raise ValueError("incompatible axes:\n{!r}\nvs\n{!r}".format(axis, local_axis))
1716
1717 def extend(self, axes, validate=True, replace_wildcards=False):
ValueError: incompatible axes:
Axis([102, 103, 104], 'age')
vs
Axis([0, 1, 2], 'age')
In [109]: pop