ChunkManager
ChunkManager(
overlap = None,
group_columns = None,
keep_partial = False,
snap_coords = True,
tolerance = 1.5,
conflict = raise,
**kwargs ,
)
A class for managing the chunking of data defined in a dataframe.
The chunk manager handles both splitting and joining of contiguous, or near-contiguous, blocks of data.
Parameters
Parameter | Description |
---|---|
overlap |
The amount of overlap between each segment, starting with the end of first row. Negative values can be used for inducing gaps. |
group_columns | A sequence of column names which should be used for sorting groups. |
keep_partial |
If True, keep segments which are shorter than chunk size (at end of contiguous blocks) |
tolerance |
The upper limit of a gap to tolerate in terms of the sampling along the desired dimension. E.G., the default value means entities with gaps <= 1.5 * {name}_step will be merged. |
conflict |
Indicates how to handle conflicts in attributes other than those indicated by dim (eg tag, history, station, etc). If “drop” simply drop conflicting attributes, or attributes not shared by all models. If “raise” raise an [AttributeMergeError]( dascore.exceptions.AttributeMergeError ] whenissues are encountered. If “keep_first”, just keep the first value for each attribute. |
**kawrgs |
kwargs specify the column along which to chunk. The key specifies the column along which to chunk, typically, time or distance , and thevalue specifies the chunk size. A value of None means to chunk on all available data (e.g. merge all data). |
Note
This class is used internally by dc.BaseSpool.chunk
.
Methods
Name | Description |
---|---|
chunk | Chunk a dataframe into new contiguous segments. |
get_instruction_df | Get a dataframe connecting the chunked dataframe to its origin. |