pds4_tools.reader.data_types module

Functions

data_type_convert_array(data_type, byte_string) Cast binary data in the form of a byte_string to a flat array having proper dtype for data_type.
data_type_convert_table_ascii(data_type, data) Cast data originating from a PDS4 Table_Character or Table_Delimited data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type.
data_type_convert_table_binary(data_type, data) Cast data originating from a PDS4 Table_Binary data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type.
pds_to_numpy_type([data_type, data, ...]) Obtain a NumPy dtype for PDS4 data.
pds_to_builtin_type([data_type, data, ...]) Obtain a Python __builtin__ data type for PDS4 data.
pds_to_numpy_name(name) Create a NumPy field name from a PDS4 field name.
apply_scaling_and_value_offset(data[, ...]) Applies scaling factor and value offset to data.
adjust_array_data_type(array[, ...]) Converts the input array into a new large enough data type if adjusting said array as-is by scaling_factor or value_offset would result in an overflow.
get_scaled_numpy_type([data_type, data, ...]) Obtain the NumPy dtype that would be necessary to store PDS4 data once that data has been scaled.
decode_bytes_to_unicode(array) Decodes each byte string in the array into unicode.
mask_special_constants(data, special_constants) Mask out special constants in an array.
get_min_integer_numpy_type(data) Obtain smallest integer NumPy dtype that can store every value in the input array.
is_pds_integer_data([data, pds_data_type]) Determine, from a data array or from a PDS4 data type, whether such data is an integer.

Details

data_type_convert_array(data_type, byte_string)[source]

Cast binary data in the form of a byte_string to a flat array having proper dtype for data_type.

Parameters:

data_type : str or unicode

The PDS4 data type that the data should be cast to.

byte_string : str, bytes or buffer

PDS4 byte string data for an array data structure or a table binary field.

Returns:

np.ndarray

Array-like view of the data cast from a byte string into values having the indicated data type. Will be read-only if underlying byte_string is immutable.

data_type_convert_table_ascii(data_type, data, mask_numeric_nulls=False, decode_strings=False)[source]

Cast data originating from a PDS4 Table_Character or Table_Delimited data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type. Most likely this data is a single Field, or a single repetition of a Field, since different Fields have different data types.

Parameters:

data_type : str or unicode

The PDS4 data type that the data should be cast to.

data : array_like[str or bytes]

Flat array of PDS4 byte strings from a Table_Character data structure.

mask_numeric_nulls : bool

If True, then data may contain empty values for a numeric data_type. If such nulls are found, they will be masked out and a masked array will be returned. Defaults to False, in which case an exception will be raised should an empty value be found in a numeric field.

decode_strings : bool, optional

If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. Defaults to False.

Returns:

np.ndarray

Data cast from a byte string array into a values array having the right data type.

data_type_convert_table_binary(data_type, data, decode_strings=False)[source]

Cast data originating from a PDS4 Table_Binary data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type. Most likely this data is a single Field, or a single repetition of a Field, since different Fields have different data types.

Parameters:

data_type : str or unicode

The PDS4 data type that the data should be cast to.

data : array_like[str or bytes]

Flat array of PDS4 byte strings from a Table_Binary data structure.

decode_strings : bool, optional

If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. Defaults to False.

Returns:

np.ndarray

Data cast from a byte string array into a values array having the right data type.

pds_to_numpy_type(data_type=None, data=None, field_length=None, decode_strings=False, scaling_factor=None, value_offset=None, include_endian=True)[source]

Obtain a NumPy dtype for PDS4 data.

Either data or data_type must be provided.

Parameters:

data_type : str or unicode, optional

A PDS4 data type. If data is omitted, the obtained NumPy data type is based on this value (see notes).

data : array_like, optional

A data array. If data_type is omitted, the obtained NumPy data type is based on this value (see notes).

field_length : int, optional

If given, and the returned dtype is a form of character, then it will include the number of characters. Takes priority over length of data when given.

decode_strings : bool, optional

If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. If data is given and is unicode, then this setting will be ignored and unicode dtype will be returned. Defaults to False.

scaling_factor : int, float or None, optional

PDS4 scaling factor. If given, the returned dtype will be large enough to contain data scaled by this number. Defaults to None, indicating a value of 1.

value_offset : int, float or None, optional

PDS4 value offset. If given, the returned dtype will be large enough to contain data offset by this number. Defaults to None, indicating a value of 0.

include_endian : bool, optional

If True, the returned dtype will contain an explicit endianness as specified by the PDS4 data type. If False, the dtype will not specifically indicate the endianness, typically implying same endianness as the current machine. Defaults to True.

Returns:

np.dtype

A NumPy dtype that can store the data described by the input parameters.

Notes

For certain data (such as ASCII_Integer), there are a number of NumPy dtypes (e.g. int8, int32, int64) that could be used. If only the PDS4 data type is given, the returned dtype will be large enough to store any possible valid value according to the PDS4 Standard. However, if the data parameter is specified, then the obtained dtype will not be any larger than needed to store exactly that data (plus any scaling/offset specified).

pds_to_builtin_type(data_type=None, data=None, decode_strings=False, scaling_factor=None, value_offset=None)[source]

Obtain a Python __builtin__ data type for PDS4 data.

Either data or data_type must be provided.

Parameters:

data_type : str or unicode, optional

A PDS4 data type. If data is omitted, the obtained builtin data type is based on this value.

data : array_like, optional

A data array. If data_type is omitted, the obtained builtin data type is based on this value.

decode_strings : bool, optional

If True, and the returned data type is a form of character, then the obtained data type will be either str (Python 3) or unicode (Python 2). If False, then for character data the obtained data type will remain byte strings. If data is given and is unicode, then this setting will be ignored and unicode data type will be returned. Defaults to False.

scaling_factor : int, float or None, optional

PDS4 scaling factor. If given, the returned data type will be large enough to contain data scaled by this number. Defaults to None, indicating a value of 1.

value_offset : int, float or None, optional

PDS4 value offset. If given, the returned data type will will be large enough to contain data offset by this number. Defaults to None, indicating a value of 0.

Returns:

str, unicode, int, float, bool, complex

A builtin data type that can store the data described by the input parameters.

pds_to_numpy_name(name)[source]

Create a NumPy field name from a PDS4 field name.

Parameters:

name : str or unicode

A PDS4 field name.

Returns:

str

A NumPy-compliant field name.

apply_scaling_and_value_offset(data, scaling_factor=None, value_offset=None, special_constants=None)[source]

Applies scaling factor and value offset to data.

Data is modified in-place, if possible. Data type may change to prevent numerical overflow if applying scaling factor and value offset would cause one.

Parameters:

data : array_like

Any numeric PDS4 data.

scaling_factor : int, float or None, optional

PDS4 scaling factor to apply to the array. Defaults to None, indicating a value of 1.

value_offset : int, float or None, optional

PDS4 value offset to apply to the array. Defaults to None, indicating a value of 0.

special_constants : dict, optional

If provided, the keys correspond to names and values correspond to numeric values for special constants. Those particular values will not be scaled or offset.

Returns:

np.ndarray or subclass

data with scaling_factor and value_offset applied, potentially with a new dtype if necessary to fit new values.

adjust_array_data_type(array, scaling_factor=None, value_offset=None)[source]

Converts the input array into a new large enough data type if adjusting said array as-is by scaling_factor or value_offset would result in an overflow. This can be necessary both if the array is data from a PDS4 Array or a PDS4 Table, so long as it has a scaling factor or value offset associated with it.

Parameters:

array : array_like

Any PDS4 numeric data.

scaling_factor : int, float or None, optional

PDS4 scaling factor to apply to the array. Defaults to None, indicating a value of 1.

value_offset : int, float or None, optional

PDS4 value offset to apply to the array. Defaults to None, indicating a value of 0.

Returns:

np.ndarray or subclass

Original array modified to have a new data type if necessary or unchanged if otherwise.

get_scaled_numpy_type(data_type=None, data=None, scaling_factor=None, value_offset=None)[source]

Obtain the NumPy dtype that would be necessary to store PDS4 data once that data has been scaled.

When scaling data, the final data type is likely going to be different from the original data type it has. (E.g. if you multiply integers by a float, then the final data type will be float.) This method determines what that final data type will have to be when given the initial data type and the scaling and offset values.

Parameters:

data_type : str or unicode, optional

If given, specifies the initial PDS4 data type that the unscaled data has or would have.

data : array_like or None, optional

If given, an array of data. When given, the initial data type for the unscaled data will be taken from this array and data_type ignored. For some ASCII data types in PDS4, the exact necessary data type (scaled or unscaled) can only be obtained when the data is already known. If data is not given, a data type sufficient (but possibly larger than necessary) to store the data will be returned. Defaults to None.

scaling_factor : int, float, or None

PDS4 scaling factor that will later be applied to the data. Defaults to None, indicating a value of 1.

value_offset : int, float, or None

PDS4 value offset that will later be applied to the data. Defaults to None, indicating a value of 0.

Returns:

np.dtype

A NumPy dtype large enough to store the data if it has had scaling_factor and value_offset.

Notes

For masked data, the output type will be large enough to store the masked data values as if they had been scaled/offset. This is because NumPy documentation notes that masked data are not guaranteed to be unaffected by arithmetic operations, only that every attempt will be made to do so.

decode_bytes_to_unicode(array)[source]

Decodes each byte string in the array into unicode.

Parameters:

array : array_like

An array containing only byte strings (str in Python 2, bytes in Python 3).

Returns:

np.ndarray or subclass

An array in which each element of input array has been decoded to unicode.

mask_special_constants(data, special_constants, mask_strings=False, copy=False)[source]

Mask out special constants in an array.

Parameters:

data : array_like

An array of data in which to mask out special constants.

special_constants : dict

A dictionary, where keys are the names of the special constants, and the values will be masked out.

mask_strings : bool, optional

If True, character data will also be masked out if it has special constants. If False, only numeric data will be masked out. Defaults to False.

copy : bool, optional

If True, the returned masked data is a copy. If False, a view is returned instead. Defaults to False.

Returns:

np.ma.MaskedArray, np.ndarray or subclass

If data to be masked is found, an np.ma.MaskedArray or subclass view (preserving input class if it was already a subclass of masked arrays). Otherwise the input data will be returned.

Notes

The match between special constant value and data value (to mask it out) in this method is simplistic and based on the NumPy implementation of equality. Currently the PDS4 Standard does not provide enough clarity on how this match should be done.

get_min_integer_numpy_type(data)[source]

Obtain smallest integer NumPy dtype that can store every value in the input array.

Parameters:

data : array_like

PDS4 integer data.

Returns:

np.dtype

The NumPy dtype that can store all integers in data.

is_pds_integer_data(data=None, pds_data_type=None)[source]

Determine, from a data array or from a PDS4 data type, whether such data is an integer.

Parameters:

data : array_like, optional

If given, checks whether this data is integer data.

pds_data_type

If given, checks whether this PDS data type corresponds to integer data.

Returns:

bool

True if data and/or pds_data_type contain or correspond to PDS4 integer data, False otherwise.

Notes

This is necessary, as opposed to simply checking for dtype, because some PDS4 data is integer but may have the ‘object’ dtype because it may overflow 64-bit integers (e.g. ASCII_Numeric_Base data, which is not limited to 64-bit sizes by the PDS4 standard).