import dascore as dc
from dascore import print
= dc.get_example_patch("random_das")
pa1 = dc.get_example_patch("example_event_1") pa2
Patch
A Patch
manages an array and its associated coordinate labels and metadata.
The Patch
design was inspired by Xarray’s DataArray
Patch creation
Patches can be created in several different ways.
Load an example patch
DASCore includes several example datasets. They are mostly used for simple demonstrations and testing.
See get_example_patch
for supported patches.
Load a file
A single file can be loaded like this:
Code
# This codeblock is just to get a usable path for the next cell.
import dascore as dc
from dascore.utils.downloader import fetch
= fetch("terra15_das_1_trimmed.hdf5") path
import dascore as dc
# path should be a path to your file. EG,
# path = mydata.hdf5
= dc.spool(path)[0] pa
Spools are covered in more detail in the next section.
Manually create a patch
Patches can be created from:
- A data array
- Coordinates for labeling each axis
- Attributes (optional)
import numpy as np
import dascore as dc
from dascore.utils.time import to_timedelta64
# Create the patch data
= np.random.random(size=(300, 2_000))
array
# Create attributes, or metadata
= dict(
attrs ="DAS",
categoryid="test_data1",
="um/(m * s)"
data_units
)
# Create coordinates, labels for each axis in the array.
= dc.to_datetime64("2017-09-18")
time_start = to_timedelta64(1 / 250)
time_step = time_start + np.arange(array.shape[1]) * time_step
time
= 0
distance_start = 1
distance_step = distance_start + np.arange(array.shape[0]) * distance_step
distance
= dict(time=time, distance=distance)
coords
# Define dimensions (first label corresponds to data axis 0)
= ('distance', 'time')
dims
= dc.Patch(data=array, coords=coords, attrs=attrs, dims=dims) pa
Patch anatomy
Data
The data is simply an n-dimensional array which is accessed with the data
attribute.
import dascore as dc
= dc.get_example_patch()
patch
print(f"Data shape is {patch.data.shape}")
print(f"Data contents are\n{patch.data}")
Data shape is (300, 2000)
Data contents are [[0.77770241 0.23754122 0.82427853 ... 0.36950848 0.07650396 0.23197621] [0.49689594 0.44224037 0.70329426 ... 0.12617754 0.11760625 0.78003741] [0.20681917 0.19516906 0.17434521 ... 0.84933595 0.36479426 0.80740811] ... [0.61877586 0.1053084 0.66896335 ... 0.621027 0.43559346 0.49975826] [0.75717115 0.25935121 0.09051709 ... 0.36099578 0.9365496 0.10351814] [0.15780837 0.29487104 0.58475197 ... 0.22898748 0.23950251 0.49439913]]
The data arrays should be read-only. This means you can’t modify them, but must first make a copy.
import numpy as np
10] = 12 # won't work
patch.data[:
= np.array(patch.data) # this makes a copy
array 10] = 12 # then this works array[:
Coords
DASCore implements a class called CoordManager which managers dimension names, coordinate labels, selecting, sorting, etc. CoordManager
has several convenience methods for accessing contained information:
import dascore as dc
= dc.get_example_patch()
patch = patch.coords
coords
# Get an array of time values
= coords.get_array("time")
time_array
# Get the maximum distance value
= coords.max("distance")
distance_max
# Get the time step (NaN if time isn't evenly sampled)
= coords.step("time") time_step
For convenience, coordinates and their corresponding arrays can be accessed from the patch level as well.
import dascore as dc
= dc.get_example_patch()
patch
# Get the coordinate object for distance
= patch.get_coord("distance")
distance_coord
# Get the array of values cooresponding to time
= patch.get_array("time") time_array
Coords also have an expressive string representation:
print(coords)
➤ Coordinates (distance: 300, time: 2000) *distance: CoordRange( min: 0 max: 299 step: 1 shape: (300,) dtype: int64 units: m ) *time: CoordRange( min: 2017-09-18 max: 2017-09-18T00:00:07.996 step: 0.004s shape: (2000,) dtype: datetime64[ns] units: s )
Patch dimensions may have an associated coordinate with the same name but this is not required.
Coordinates are often (but not always) associated with one or more dimensions. For example, coordinates “latitude” and “longitude” are often associated with dimension “distance”.
Most of the other CoordManager
features are primarily used internally by DASCore, but you can read more about them in the Coordinate Tutorial.
Attrs
The metadata stored in Patch.attrs
is a pydantic model which enforces a schema and provides validation. PatchAttrs.get_summary_df
generates a table of the attribute descriptions:
attribute | description |
---|---|
data_type | Describes the quantity being measured. |
data_category | Describes the type of data. |
data_units | The units of the data measurements |
instrument_id | A unique id for the instrument which generated the data. |
acquisition_id | A unique identifier linking this data to an experiment. |
tag | A custom string field. |
station | A station code. |
network | A network code. |
history | A list of processing performed on the patch. |
dims | A tuple of comma-separated dimensions names. |
Specific data formats may also add attributes (e.g. “gauge_length”, “pulse_width”), but this depends on the parser.
String representation
DASCore Patches have a useful string representation:
import dascore as dc
= dc.get_example_patch()
patch print(patch)
DASCore Patch ⚡ --------------- ➤ Coordinates (distance: 300, time: 2000) *distance: CoordRange( min: 0 max: 299 step: 1 shape: (300,) dtype: int64 units: m ) *time: CoordRange( min: 2017-09-18 max: 2017-09-18T00:00:07.996 step: 0.004s shape: (2000,) dtype: datetime64[ns] units: s ) ➤ Data (float64) [[0.778 0.238 0.824 ... 0.37 0.077 0.232] [0.497 0.442 0.703 ... 0.126 0.118 0.78 ] [0.207 0.195 0.174 ... 0.849 0.365 0.807] ... [0.619 0.105 0.669 ... 0.621 0.436 0.5 ] [0.757 0.259 0.091 ... 0.361 0.937 0.104] [0.158 0.295 0.585 ... 0.229 0.24 0.494]] ➤ Attributes tag: random category: DAS
Shortcuts
DASCore Patches offer a few shortcuts for quickly accessing commonly used information:
import dascore as dc
= dc.get_example_patch()
patch print(patch.seconds) # to get the number of seconds in the patch.
print(patch.channel_count) # to get the number of channels in the patch.
8.0
300
These only work for patches with dimensions “time” and “distance” but can help new users who may be unfamiliar datetimes and coordinates.
Trim and Reshape
The following methods help trim, reshape, and manipulate coordinates.
Select
Patches are trimmed using the Patch.select
method. Unlike Patch.order
, select
will not change the order of the affected dimensions, it will only remove elements. Most commonly, select
takes the coordinate name and a tuple of (lower_limit, upper_limit) as the values. Either limit can be ...
indicating an open interval.
import numpy as np
import dascore as dc
= dc.get_example_patch()
patch = patch.attrs
attrs
# Select 1 sec after current start time to 1 sec before end time.
= patch.get_coord("time")
time = dc.to_timedelta64(1)
one_sec = (time.min() + one_sec, time.max() - one_sec)
select_tuple = patch.select(time=select_tuple)
new
# Select only the first half of the distance channels.
= np.mean(patch.coords.get_array('distance'))
distance_max = patch.select(distance=(..., distance_max)) new
The relative
keyword is used to trim coordinates based on the start (positive) and end (negative).
import dascore as dc
from dascore.units import ft
= dc.get_example_patch()
patch
# We can make the example above simpler with relative selection
= patch.select(time=(1, -1), relative=True)
new
# select 2 seconds from end to 1 second from end
= patch.select(time=(-2, -1), relative=True)
new
# select last 100 ft of distance channels
= patch.select(distance=(-100 * ft, ...), relative=True) new
The samples
keyword tells select
the meaning of the query is in samples rather than the units of the selected dimension. Unlike absolute selections, sample selections are always relative to the data contained in the patch. For example, 0 refers to the first sample along the dimension and -1 refers to the last.
import dascore as dc
= dc.get_example_patch()
patch
# Trim patch to only include first 10 time rows (or columns).
= patch.select(time=(..., 10), samples=True)
new
# Only include the last distance column or row.
= patch.select(distance=-1, samples=True) new
Arrays can also be passed as values in which case they will be treated like sets, meaning only coordinate elements in the array will be selected.
import numpy as np
import dascore as dc
= dc.get_example_patch()
patch
# Create an array of desired distances.
= np.array([10., 18., 12.])
dist_to_select = patch.select(distance=dist_to_select)
sub_patch # Test that select worked.
assert set(sub_patch.get_array('distance')) == set(dist_to_select)
# Samples also work
= patch.select(distance=np.array([0, 12, 10, 9]), samples=True)
sub_patch assert len(sub_patch.get_array('distance')) == 4
Order
Order is similar to Patch.select
, but will re-arrange data to the order specified by a value array. This may also cause parts of the patch to be duplicated.
import numpy as np
import dascore as dc
= dc.get_example_patch()
patch
# Get a patch with a new distance ordering
= np.array([20., 10., 15.])
dist_order1 = patch.order(distance=dist_order1)
patch_dist1 assert np.all(dist_order1 == patch_dist1.get_array("distance"))
# Get a patch with duplicate entries for distance
= np.array([20., 20., 20.])
dist_order2 = patch.order(distance=dist_order2)
patch_dist2 assert np.all(dist_order2 == patch_dist2.get_array("distance"))
New dimensions
Sometimes it can be useful to add new (empty) dimensions to a Patch. Patch.append_dims
does this.
import dascore as dc
= dc.get_example_patch()
patch
# Create a new patch with a single empty dimension called "nothing"
= patch.append_dims("nothing")
patch_dims
# Create a new patch with a length two dimension called "money".
# The contents of the patch are repeated along the new dimension to fill
# the required length.
= patch.append_dims(money=2)
patch_extended_dims
# Transpose can then be used to re-arrange the dims
= patch_extended_dims.transpose("time", "money", "distance")
patch_extended
# And update coords to add coordinate values to the new dim
= patch_extended.update_coords(money=[10, 30]) patch_extended_coord
Although these examples are quite contrived, these functions are very useful for transforms which create high dimensional patches.
Processing
The patch has several methods which are intended to be chained together via a fluent interface, meaning each method returns a new Patch
instance.
import dascore as dc
= dc.get_example_patch()
pa
= (
out # Decimate to reduce data volume by 8 along time dimension
=8)
pa.decimate(time# Detrend along distance dimension
='distance')
.detrend(dim# Apply a low-pass 10 Hz butterworth filter along time dimension
=(..., 10))
.pass_filter(time )
The processing methods are located in the dascore.proc module. The patch processing tutorial provides more information about processing routines.
Visualization
DASCore provides some visualization functions in the dascore.viz module or using the Patch.viz
namespace. DASCore generally only implements simple, matplotlib based visualizations but other DASDAE packages will likely do more interesting visualizations.
import dascore as dc
= (
patch 'example_event_1')
dc.get_example_patch(=0.05)
.taper(time=(None, 300))
.pass_filter(time
)
=True, scale=0.2); patch.viz.waterfall(show
Modifying patches
Because patches should be treated as immutable objects, they can’t be modified with normal attribute assignment. However, DASCore provides several methods that return new patches with modifications.
Update
Patch.update
uses the Patch
instances as a template and returns a new Patch
instances with one or more aspects modified.
import dascore as dc
= dc.get_example_patch()
pa
# Create a copy of patch with new data but coords and attrs stay the same.
= pa.update(data=pa.data * 10)
new_data_patch
# Completely replace the attributes.
= pa.update(attrs=dict(station="TMU")) new_data_patch
Update attrs
Patch.update_attrs
is for making changes to the attrs (metadata) while keeping the unaffected metadata (Patch.update
would completely replace the old attrs).
import dascore as dc
= dc.get_example_patch()
pa
# Update existing attribute 'network' and create new attr 'new_attr'
= pa.update_attrs(network='exp1', new_attr=42) pa1
Update coords
Patch.update_coords
returns a new patch with the coordinates changed in some way. These changes can include: - Modifying (updating) existing coordinates - Adding new coordinates - Changing coordinate dimensional association
Modifying coordinates
Coordinates can be updated by specifying a new array which should take the place of the old one:
import dascore as dc
= dc.get_example_patch()
pa
# Add one second to all values in the time array.
= dc.to_timedelta64(1)
one_second = pa.coords.get_array('time')
old_time = pa.update_coords(time=old_time + one_second) new
Or by specifying new min, max, or step values for a coordinate.
import dascore as dc
= dc.get_example_patch()
pa
# Change the starting time of the array.
= pa.coords.min('time') + one_second
new_time = pa.update_coords(time_min=new_time) new
Adding coordinates
Commonly, additional coordinates, such as latitude/longitude, are attached to a particular dimension such as distance. It is also possible to include coordinates that are not associated with any dimensions.
import numpy as np
import dascore as dc
= dc.get_example_patch()
pa = pa.coords
coords = coords.get_array('distance')
dist = coords.get_array('time')
time
# Add a single coordinate associated with distance dimension.
= np.arange(0, len(dist)) * .001 -109.857952
lat # Note the tuple form: (associated_dimension, value)
= pa.update_coords(latitude=('distance', lat))
out_1
# Add multiple coordinates associated with distance dimension.
= np.arange(0, len(dist)) *.001 + 41.544654
lon = pa.update_coords(
out_2 =('distance', lat),
latitude=('distance', lon),
longitude
)
# Add coordinate associted with multiple dimensions.
= np.ones_like(pa.data)
quality = pa.update_coords(
out_3 =(pa.dims, quality)
quality
)
# Add coordinate which isn't associated with a dimension.
= pa.update_coords(non_dim=(None, np.arange(10))) no_dim_coord
Changing coordinate dimensional association
The dimensions each coordinate is associated with can be changed. For example, to remove a coordinate’s dimension association:
import dascore as dc
# Load a patch which has latitude and longitude coordinates.
= dc.get_example_patch("random_patch_with_lat_lon")
patch
# Dissassociate latitude from distance.
= patch.coords.get_array('latitude')
lat = patch.update_coords(latitude=(None, lat)) patch_detached_lat
Dropping coordinates
Non-dimensional coordinates can be dropped using Patch.drop_coords
. Dimensional coordinates, however, cannot be dropped doing so would force the patch data to become degenerate.
import dascore as dc
# This patch has latitude and longitude coordinates
= dc.get_example_patch("random_patch_with_lat_lon")
patch
# Drop latitude, this wont affect the data or other coordinates
= patch.drop_coords("latitude")
patch_dropped_lat print(patch_dropped_lat.coords)
➤ Coordinates (distance: 300, time: 2000) *distance: CoordRange( min: 0 max: 299 step: 1 shape: (300,) dtype: int64 units: m ) *time: CoordRange( min: 2017-09-18 max: 2017-09-18T00:00:07.996 step: 0.004s shape: (2000,) dtype: datetime64[ns] units: s ) longitude ('distance',): CoordRange( min: 41.5 max: 41.8 step: 0.001 shape: (300,) dtype: float64 )
Coords in patch initialization
Any number of coordinates can also be assigned when the patch is initiated. For coordinates other than those of the patch dimensions, the associated dimensions must be specified. For example:
import dascore as dc
import numpy as np
# Create data for patch
= np.random.RandomState(13)
rand = rand.random(size=(20, 100))
array = np.datetime64("2020-01-01")
time1
# Create patch attrs
= dict(dx=1, d_time=1 / 250.0, category="DAS", id="test_data1")
attrs = dc.to_timedelta64(np.arange(array.shape[1]) * attrs["d_time"])
time_deltas
# Create coordinate data
= np.arange(array.shape[0]) * attrs["dx"]
distance = time1 + time_deltas
time = np.ones_like(array)
quality = np.arange(array.shape[0]) * .001 - 111.00
latitude
# Create coord dict
= dict(
coords =distance,
distance=time,
time=("distance", latitude), # Note distance is attached dimension
latitude=(("distance", "time"), quality), # Two attached dimensions here
quality
)
# Define dimensions of array and init Patch
= ("distance", "time")
dims = dc.Patch(data=array, coords=coords, attrs=attrs, dims=dims) out
Units
As mentioned in the units section of the concept page, DASCore provides first-class support for units.
Patch units
There are two methods for configuring the units associated with a Patch
.
Patch.set_units
sets the units on a patch or its coordinates. Old units are simply overwritten without performing any conversions. The first argument sets the data units and the keywords set the coordinate units.
Patch.convert_units
converts data or coordinates units by appropriately transforming the data or coordinates arrays. If no units exist they will simply be set.
import dascore as dc
= dc.get_example_patch()
patch
# Set data units and distance units; don't do any conversions
= patch.set_units("m/s", distance="ft")
patch_set_units
# Convert data units and distance units; will modify data/coords
# to correctly do the conversion.
= patch_set_units.convert_units("ft/s", distance='m') patch_conv_units
The data or coordinate units attributes are Pint Quantity, but they can be converted to strings with get_quantity_str
.
import dascore as dc
from dascore.units import get_quantity_str
= dc.get_example_patch().set_units("m/s")
patch
print(type(patch.attrs.data_units))
print(get_quantity_str(patch.attrs.data_units))
<class 'pint.Quantity'>
m / s
Units in processing functions
import dascore as dc
from dascore.units import m, ft
= dc.get_example_patch()
pa
# Sub-select a patch to only include distance from 10ft to 10m.
= pa.select(distance=(10*ft, 10*m))
sub_selected
# Filter patch for spatial wavelengths from 10m to 100m.
= pa.pass_filter(distance=(10*m, 100*m)) dist_filtered
See the documentation on Patch.select
and Patch.pass_filter
for more details.
Patch operations
Patches implement many ufunc type operations which are applied directly to a patch using built-in python operators.
In the case of scalars and numpy arrays, the operations are broadcast over the patch data. In the case of two patches, compatibility between patches are first checked, the intersection of the coords and attrs are calculated, then the operator is applied to both patches’ data. Here are a few examples:
See merge_compatible_coords_attrs
for more details on how attributes and coordinates are handled when performing operations on two patches.
Patch operations with scalars
import numpy as np
import dascore as dc
= dc.get_example_patch()
patch
= patch / 10
out1 assert np.allclose(patch.data / 10, out1.data)
= patch ** 2.3
out2 assert np.allclose(patch.data ** 2.3, out2.data)
= patch - 3
out3 assert np.allclose(patch.data - 3, out3.data)
Units are also fully supported.
import dascore as dc
from dascore.units import m, s
= dc.get_example_patch().set_units("m/s")
patch
# Multiplying patches by a quantity with units updates the data_units.
= patch * 10 * m/s
new
print(f"units before operation {patch.attrs.data_units}")
print(f"units after operation {new.attrs.data_units}")
units before operation 1.0 m / s
units after operation 1.0 m ** 2 / s ** 2
Patch operations with numpy arrays
import numpy as np
import dascore as dc
= dc.get_example_patch()
patch = np.ones(patch.shape)
ones
= patch + ones
out1 assert np.allclose(patch.data + ones, out1.data)
Units also work with numpy arrays.
import numpy as np
import dascore as dc
from dascore.units import furlongs
= dc.get_example_patch()
patch = np.ones(patch.shape) * furlongs
ones
= patch * ones
out1 print(f"units before operation {patch.attrs.data_units}")
print(f"units after operation {out1.attrs.data_units}")
units before operation None
units after operation 1 fur
Patch operations with other patches
Identically shaped patches
import numpy as np
import dascore as dc
from dascore.units import furlongs
= dc.get_example_patch()
patch
# Adding two patches together simply adds their data their
# and checks/merges coords and attrs.
= patch + patch
out
assert np.allclose(patch.data * 2, out.data)