pds4_tools.reader.data_types module

Classes

PDSdtype(name)

A PDS4 data type object.

Functions

data_type_convert_array(data_type, byte_string)

Cast binary data in the form of a byte_string to a flat array having proper dtype for data_type.

data_type_convert_table_ascii(data_type, data)

Cast data originating from a PDS4 Table_Character or Table_Delimited data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type.

data_type_convert_table_binary(data_type, data)

Cast data originating from a PDS4 Table_Binary data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type.

data_type_convert_dates(data[, data_type, ...])

Cast an array of datetime strings originating from a PDS4 Table data structure to an array having NumPy datetime64 dtype.

pds_to_numpy_type([data_type, data, ...])

Obtain a NumPy dtype for PDS4 data.

pds_to_builtin_type([data_type, data, ...])

Obtain a Python __builtin__ data type for PDS4 data.

pds_to_numpy_name(name)

Create a NumPy field name from a PDS4 field name.

apply_scaling_and_value_offset(data[, ...])

Applies scaling factor and value offset to data.

adjust_array_data_type(array[, ...])

Converts the input array into a new large enough data type if adjusting said array as-is by scaling_factor or value_offset would result in an overflow.

get_scaled_numpy_type([data_type, data, ...])

Obtain the NumPy dtype that would be necessary to store PDS4 data once that data has been scaled.

decode_bytes_to_unicode(array)

Decodes each byte string in the array into unicode.

mask_special_constants(data, special_constants)

Mask out special constants in an array.

get_min_integer_numpy_type(data)

Obtain smallest integer NumPy dtype that can store every value in the input array.

is_pds_integer_data([data, pds_data_type])

Determine, from a data array or from a PDS4 data type, whether such data is an integer.

Details

class PDSdtype(name)[source]

Bases: object

A PDS4 data type object.

Each PDS4 array and table field contains homogeneous values described by a PDSdtype object. This class is a wrapper around the named PDS4 data types, to make comparison of types easier.

property name
Returns
str or unicode

The PDS4 data type name.

__eq__(other)[source]

Compare if two data types are equal.

Parameters
otherstr, unicode or PDSdtype

A PDS4 data type.

Returns
bool

True if the data types are equal. PDSdtype objects are equal when their name attributes are identical, or if other is str-like then when it is equal to the object’s name attribute.

__contains__(other)[source]

Check if a data type contains another.

Parameters
otherstr, unicode or PDSdtype

A PDS4 data type.

Returns
bool

True if name contains at least a portion of other.

issubtype(subtype)[source]

Check if data type is a sub-type.

Parameters
subtypestr or unicode

Valid subtypes are int|integer|float|bool|datetime|bitstring|ascii|binary. Case-insensitive.

Returns
bool

True if name is a sub-type of subtype. False otherwise.

Raises
ValueError

Raised if an unknown subtype is specified.

TypeError

Raised if a non-string-like subtype is specified.

data_type_convert_array(data_type, byte_string)[source]

Cast binary data in the form of a byte_string to a flat array having proper dtype for data_type.

Parameters
data_typestr, unicode or PDSdtype

The PDS4 data type that the data should be cast to.

byte_stringstr, bytes or buffer

PDS4 byte string data for an array data structure or a table binary field.

Returns
np.ndarray

Array-like view of the data cast from a byte string into values having the indicated data type. Will be read-only if underlying byte_string is immutable.

data_type_convert_table_ascii(data_type, data, mask_nulls=False, decode_strings=False)[source]

Cast data originating from a PDS4 Table_Character or Table_Delimited data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type. Most likely this data is a single Field, or a single repetition of a Field, since different Fields have different data types.

Parameters
data_typestr, unicode or PDSdtype

The PDS4 data type that the data should be cast to.

dataarray_like[str or bytes]

Flat array of PDS4 byte strings from a Table_Character data structure.

mask_nullsbool

If True, then data may contain empty values for a numeric and boolean data_type’s. If such nulls are found, they will be masked out and a masked array will be returned. Defaults to False, in which case an exception will be raised should an empty value be found in such a field.

decode_stringsbool, optional

If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. Defaults to False.

Returns
np.ndarray

Data cast from a byte string array into a values array having the right data type.

data_type_convert_table_binary(data_type, data, decode_strings=False)[source]

Cast data originating from a PDS4 Table_Binary data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type. Most likely this data is a single Field, or a single repetition of a Field, since different Fields have different data types.

Parameters
data_typestr, unicode or PDSdtype

The PDS4 data type that the data should be cast to.

dataarray_like[str or bytes]

Flat array of PDS4 byte strings from a Table_Binary data structure.

decode_stringsbool, optional

If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. Defaults to False.

Returns
np.ndarray

Data cast from a byte string array into a values array having the right data type.

data_type_convert_dates(data, data_type=None, mask_nulls=False)[source]

Cast an array of datetime strings originating from a PDS4 Table data structure to an array having NumPy datetime64 dtype.

Parameters
dataarray_like[str or bytes]

Flat array of datetime strings in a PDS4-compatible form.

data_typestr, unicode or PDSdtype, optional

The PDS4 data type for the data. If omitted, will be obtained from the meta_data of data.

mask_nullsbool, optional

If True, then data may contain empty values. If such nulls are found, they will be masked out and a masked array will be returned. Defaults to False, in which case an exception will be raised should an empty value be found.

Returns
np.ndarray, np.ma.MaskedArray or subclass

Data cast from a string-like array to a datetime array. If null values are found, an np.ma.MaskedArray or subclass view will be returned. When the input is an instance of PDS_array, the output will be as well.

pds_to_numpy_type(data_type=None, data=None, field_length=None, decode_strings=False, decode_dates=False, scaling_factor=None, value_offset=None, include_endian=True)[source]

Obtain a NumPy dtype for PDS4 data.

Either data or data_type must be provided.

Parameters
data_typestr, unicode or PDSdtype, optional

A PDS4 data type. If data is omitted, the obtained NumPy data type is based on this value (see notes).

dataarray_like, optional

A data array. If data_type is omitted, the obtained NumPy data type is based on this value (see notes).

field_lengthint, optional

If given, and the returned dtype is a form of character, then it will include the number of characters. Takes priority over length of data when given.

decode_stringsbool, optional

If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. If data is given and is unicode, then this setting will be ignored and unicode dtype will be returned. If data_type is given and refers to bit-strings, then this setting will be ignored and a byte string dtype will be returned. Defaults to False.

decode_dates: bool, optional

If True, then the returned dtype will be a datetime64 when data_type is both given and is a form of date and/or time. If False, then the returned dtype will be a form of character according to decode_strings. If data is given, then this setting will be ignored. Defaults to False.

scaling_factorint, float or None, optional

PDS4 scaling factor. If given, the returned dtype will be large enough to contain data scaled by this number. Defaults to None, indicating a value of 1.

value_offsetint, float or None, optional

PDS4 value offset. If given, the returned dtype will be large enough to contain data offset by this number. Defaults to None, indicating a value of 0.

include_endianbool, optional

If True, the returned dtype will contain an explicit endianness as specified by the PDS4 data type. If False, the dtype will not specifically indicate the endianness, typically implying same endianness as the current machine. Defaults to True.

Returns
np.dtype

A NumPy dtype that can store the data described by the input parameters.

Notes

For certain data (such as ASCII_Integer), there are a number of NumPy dtypes (e.g. int8, int32, int64) that could be used. If only the PDS4 data type is given, the returned dtype will be large enough to store any possible valid value according to the PDS4 Standard. However, if the data parameter is specified, then the obtained dtype will not be any larger than needed to store exactly that data (plus any scaling/offset specified).

pds_to_builtin_type(data_type=None, data=None, decode_strings=False, decode_dates=False, scaling_factor=None, value_offset=None)[source]

Obtain a Python __builtin__ data type for PDS4 data.

Either data or data_type must be provided.

Parameters
data_typestr, unicode or PDSdtype, optional

A PDS4 data type. If data is omitted, the obtained builtin data type is based on this value.

dataarray_like, optional

A data array. If data_type is omitted, the obtained builtin data type is based on this value.

decode_stringsbool, optional

If True, and the returned data type is a form of character, then the obtained data type will be either str (Python 3) or unicode (Python 2). If False, then for character data the obtained data type will remain byte strings. If data is given and is unicode, then this setting will be ignored and unicode data type will be returned. If data_type is given and refers to bit-strings, then this setting will be ignored and a byte string data type will be returned. Defaults to False.

decode_dates: bool, optional

If True, then the returned data type will be a form of date/time when data_type is both given and is a form of date and/or time. If False, then the returned data type will be a form of character according to decode_strings. If data is given, then this setting will be ignored. Defaults to False.

scaling_factorint, float or None, optional

PDS4 scaling factor. If given, the returned data type will be large enough to contain data scaled by this number. Defaults to None, indicating a value of 1.

value_offsetint, float or None, optional

PDS4 value offset. If given, the returned data type will will be large enough to contain data offset by this number. Defaults to None, indicating a value of 0.

Returns
str, unicode, bytes, int, float, bool, complex

A builtin data type that can store the data described by the input parameters.

pds_to_numpy_name(name)[source]

Create a NumPy field name from a PDS4 field name.

Parameters
namestr or unicode

A PDS4 field name.

Returns
str

A NumPy-compliant field name.

apply_scaling_and_value_offset(data, scaling_factor=None, value_offset=None, special_constants=None)[source]

Applies scaling factor and value offset to data.

Data is modified in-place, if possible. Data type may change to prevent numerical overflow if applying scaling factor and value offset would cause one.

Parameters
dataarray_like

Any numeric PDS4 data.

scaling_factorint, float or None, optional

PDS4 scaling factor to apply to the array. Defaults to None, indicating a value of 1.

value_offsetint, float or None, optional

PDS4 value offset to apply to the array. Defaults to None, indicating a value of 0.

special_constantsdict, optional

If provided, the keys correspond to names and values correspond to numeric values for special constants. Those particular values will not be scaled or offset.

Returns
np.ndarray or subclass

data with scaling_factor and value_offset applied, potentially with a new dtype if necessary to fit new values.

adjust_array_data_type(array, scaling_factor=None, value_offset=None)[source]

Converts the input array into a new large enough data type if adjusting said array as-is by scaling_factor or value_offset would result in an overflow. This can be necessary both if the array is data from a PDS4 Array or a PDS4 Table, so long as it has a scaling factor or value offset associated with it.

Parameters
arrayarray_like

Any PDS4 numeric data.

scaling_factorint, float or None, optional

PDS4 scaling factor to apply to the array. Defaults to None, indicating a value of 1.

value_offsetint, float or None, optional

PDS4 value offset to apply to the array. Defaults to None, indicating a value of 0.

Returns
np.ndarray or subclass

Original array modified to have a new data type if necessary or unchanged if otherwise.

get_scaled_numpy_type(data_type=None, data=None, scaling_factor=None, value_offset=None)[source]

Obtain the NumPy dtype that would be necessary to store PDS4 data once that data has been scaled.

When scaling data, the final data type is likely going to be different from the original data type it has. (E.g. if you multiply integers by a float, then the final data type will be float.) This method determines what that final data type will have to be when given the initial data type and the scaling and offset values.

Parameters
data_typestr, unicode or PDSdtype, optional

If given, specifies the initial PDS4 data type that the unscaled data has or would have.

dataarray_like or None, optional

If given, an array of data. When given, the initial data type for the unscaled data will be taken from this array and data_type ignored. For some ASCII data types in PDS4, the exact necessary data type (scaled or unscaled) can only be obtained when the data is already known. If data is not given, a data type sufficient (but possibly larger than necessary) to store the data will be returned. Defaults to None.

scaling_factorint, float, or None

PDS4 scaling factor that will later be applied to the data. Defaults to None, indicating a value of 1.

value_offsetint, float, or None

PDS4 value offset that will later be applied to the data. Defaults to None, indicating a value of 0.

Returns
np.dtype

A NumPy dtype large enough to store the data if it has had scaling_factor and value_offset.

Notes

For masked data, the output type will be large enough to store the masked data values as if they had been scaled/offset. This is because NumPy documentation notes that masked data are not guaranteed to be unaffected by arithmetic operations, only that every attempt will be made to do so.

decode_bytes_to_unicode(array)[source]

Decodes each byte string in the array into unicode.

Parameters
arrayarray_like

An array containing only byte strings (str in Python 2, bytes in Python 3).

Returns
np.ndarray or subclass

An array in which each element of input array has been decoded to unicode.

mask_special_constants(data, special_constants, mask_strings=False, copy=False)[source]

Mask out special constants in an array.

Parameters
dataarray_like

An array of data in which to mask out special constants.

special_constantsdict

A dictionary, where keys are the names of the special constants, and the values will be masked out.

mask_stringsbool, optional

If True, character data will also be masked out if it has special constants. If False, only numeric data will be masked out. Defaults to False.

copybool, optional

If True, the returned masked data is a copy. If False, a view is returned instead. Defaults to False.

Returns
np.ma.MaskedArray, np.ndarray or subclass

If data to be masked is found, an np.ma.MaskedArray or subclass view (preserving input class if it was already a subclass of masked arrays). Otherwise the input data will be returned.

Notes

The match between special constant value and data value (to mask it out) in this method is simplistic. For numeric values, it is based on the NumPy implementation of equality. For string values, the match is done by trimming leading/trailing whitespaces in both data value and special constant, then comparing for exact equality. Currently the PDS4 Standard does not provide enough clarity on how Special_Constant matching should truly be done.

get_min_integer_numpy_type(data)[source]

Obtain smallest integer NumPy dtype that can store every value in the input array.

Parameters
dataarray_like

PDS4 integer data.

Returns
np.dtype

The NumPy dtype that can store all integers in data.

is_pds_integer_data(data=None, pds_data_type=None)[source]

Determine, from a data array or from a PDS4 data type, whether such data is an integer.

Parameters
dataarray_like, optional

If given, checks whether this data is integer data.

pds_data_typestr, unicode or PDSdtype, optional

If given, checks whether this PDS data type corresponds to integer data.

Returns
bool

True if data and/or pds_data_type contain or correspond to PDS4 integer data, False otherwise.

Notes

This is necessary, as opposed to simply checking for dtype, because some PDS4 data is integer but may have the ‘object’ dtype because it may overflow 64-bit integers (e.g. ASCII_Numeric_Base data, which is not limited to 64-bit sizes by the PDS4 standard).