pds4_tools.reader.data_types module¶
Classes¶
|
A PDS4 data type object. |
Functions¶
|
Cast binary data in the form of a byte_string to a flat array having proper dtype for data_type. |
|
Cast data originating from a PDS4 Table_Character or Table_Delimited data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type. |
|
Cast data originating from a PDS4 Table_Binary data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type. |
|
Cast an array of datetime strings originating from a PDS4 Table data structure to an array having NumPy datetime64 dtype. |
|
Obtain a NumPy dtype for PDS4 data. |
|
Obtain a Python __builtin__ data type for PDS4 data. |
|
Create a NumPy field name from a PDS4 field name. |
|
Applies scaling factor and value offset to data. |
|
Converts the input array into a new large enough data type if adjusting said array as-is by scaling_factor or value_offset would result in an overflow. |
|
Obtain the NumPy dtype that would be necessary to store PDS4 data once that data has been scaled. |
|
Decodes each byte string in the array into unicode. |
|
Mask out special constants in an array. |
Obtain smallest integer NumPy dtype that can store every value in the input array. |
|
|
Determine, from a data array or from a PDS4 data type, whether such data is an integer. |
Details¶
- class PDSdtype(name)[source]¶
Bases:
object
A PDS4 data type object.
Each PDS4 array and table field contains homogeneous values described by a PDSdtype object. This class is a wrapper around the named PDS4 data types, to make comparison of types easier.
- property name¶
- Returns
- str or unicode
The PDS4 data type name.
- __eq__(other)[source]¶
Compare if two data types are equal.
- Parameters
- otherstr, unicode or PDSdtype
A PDS4 data type.
- Returns
- bool
True if the data types are equal. PDSdtype objects are equal when their
name
attributes are identical, or if other is str-like then when it is equal to the object’sname
attribute.
- __contains__(other)[source]¶
Check if a data type contains another.
- Parameters
- otherstr, unicode or PDSdtype
A PDS4 data type.
- Returns
- bool
True if
name
contains at least a portion of other.
- issubtype(subtype)[source]¶
Check if data type is a sub-type.
- Parameters
- subtypestr or unicode
Valid subtypes are int|integer|float|bool|datetime|bitstring|ascii|binary. Case-insensitive.
- Returns
- bool
True if
name
is a sub-type of subtype. False otherwise.
- Raises
- ValueError
Raised if an unknown subtype is specified.
- TypeError
Raised if a non-string-like subtype is specified.
- data_type_convert_array(data_type, byte_string)[source]¶
Cast binary data in the form of a byte_string to a flat array having proper dtype for data_type.
- Parameters
- data_typestr, unicode or PDSdtype
The PDS4 data type that the data should be cast to.
- byte_stringstr, bytes or buffer
PDS4 byte string data for an array data structure or a table binary field.
- Returns
- np.ndarray
Array-like view of the data cast from a byte string into values having the indicated data type. Will be read-only if underlying byte_string is immutable.
- data_type_convert_table_ascii(data_type, data, mask_nulls=False, decode_strings=False)[source]¶
Cast data originating from a PDS4 Table_Character or Table_Delimited data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type. Most likely this data is a single Field, or a single repetition of a Field, since different Fields have different data types.
- Parameters
- data_typestr, unicode or PDSdtype
The PDS4 data type that the data should be cast to.
- dataarray_like[str or bytes]
Flat array of PDS4 byte strings from a Table_Character data structure.
- mask_nullsbool
If True, then data may contain empty values for a numeric and boolean data_type’s. If such nulls are found, they will be masked out and a masked array will be returned. Defaults to False, in which case an exception will be raised should an empty value be found in such a field.
- decode_stringsbool, optional
If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. Defaults to False.
- Returns
- np.ndarray
Data cast from a byte string array into a values array having the right data type.
- data_type_convert_table_binary(data_type, data, decode_strings=False)[source]¶
Cast data originating from a PDS4 Table_Binary data structure in the form of an array_like[byte_string] to an array with the proper dtype for data_type. Most likely this data is a single Field, or a single repetition of a Field, since different Fields have different data types.
- Parameters
- data_typestr, unicode or PDSdtype
The PDS4 data type that the data should be cast to.
- dataarray_like[str or bytes]
Flat array of PDS4 byte strings from a Table_Binary data structure.
- decode_stringsbool, optional
If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. Defaults to False.
- Returns
- np.ndarray
Data cast from a byte string array into a values array having the right data type.
- data_type_convert_dates(data, data_type=None, mask_nulls=False)[source]¶
Cast an array of datetime strings originating from a PDS4 Table data structure to an array having NumPy datetime64 dtype.
- Parameters
- dataarray_like[str or bytes]
Flat array of datetime strings in a PDS4-compatible form.
- data_typestr, unicode or PDSdtype, optional
The PDS4 data type for the data. If omitted, will be obtained from the meta_data of data.
- mask_nullsbool, optional
If True, then data may contain empty values. If such nulls are found, they will be masked out and a masked array will be returned. Defaults to False, in which case an exception will be raised should an empty value be found.
- Returns
- np.ndarray, np.ma.MaskedArray or subclass
Data cast from a string-like array to a datetime array. If null values are found, an
np.ma.MaskedArray
or subclass view will be returned. When the input is an instance of PDS_array, the output will be as well.
- pds_to_numpy_type(data_type=None, data=None, field_length=None, decode_strings=False, decode_dates=False, scaling_factor=None, value_offset=None, include_endian=True)[source]¶
Obtain a NumPy dtype for PDS4 data.
Either data or data_type must be provided.
- Parameters
- data_typestr, unicode or PDSdtype, optional
A PDS4 data type. If data is omitted, the obtained NumPy data type is based on this value (see notes).
- dataarray_like, optional
A data array. If data_type is omitted, the obtained NumPy data type is based on this value (see notes).
- field_lengthint, optional
If given, and the returned dtype is a form of character, then it will include the number of characters. Takes priority over length of data when given.
- decode_stringsbool, optional
If True, and the returned dtype is a form of character, then the obtained dtype will be a form of unicode. If False, then for character data the obtained dtype will remain byte strings. If data is given and is unicode, then this setting will be ignored and unicode dtype will be returned. If data_type is given and refers to bit-strings, then this setting will be ignored and a byte string dtype will be returned. Defaults to False.
- decode_dates: bool, optional
If True, then the returned dtype will be a datetime64 when data_type is both given and is a form of date and/or time. If False, then the returned dtype will be a form of character according to decode_strings. If data is given, then this setting will be ignored. Defaults to False.
- scaling_factorint, float or None, optional
PDS4 scaling factor. If given, the returned dtype will be large enough to contain data scaled by this number. Defaults to None, indicating a value of 1.
- value_offsetint, float or None, optional
PDS4 value offset. If given, the returned dtype will be large enough to contain data offset by this number. Defaults to None, indicating a value of 0.
- include_endianbool, optional
If True, the returned dtype will contain an explicit endianness as specified by the PDS4 data type. If False, the dtype will not specifically indicate the endianness, typically implying same endianness as the current machine. Defaults to True.
- Returns
- np.dtype
A NumPy dtype that can store the data described by the input parameters.
Notes
For certain data (such as ASCII_Integer), there are a number of NumPy dtypes (e.g. int8, int32, int64) that could be used. If only the PDS4 data type is given, the returned dtype will be large enough to store any possible valid value according to the PDS4 Standard. However, if the data parameter is specified, then the obtained dtype will not be any larger than needed to store exactly that data (plus any scaling/offset specified).
- pds_to_builtin_type(data_type=None, data=None, decode_strings=False, decode_dates=False, scaling_factor=None, value_offset=None)[source]¶
Obtain a Python __builtin__ data type for PDS4 data.
Either data or data_type must be provided.
- Parameters
- data_typestr, unicode or PDSdtype, optional
A PDS4 data type. If data is omitted, the obtained builtin data type is based on this value.
- dataarray_like, optional
A data array. If data_type is omitted, the obtained builtin data type is based on this value.
- decode_stringsbool, optional
If True, and the returned data type is a form of character, then the obtained data type will be either
str
(Python 3) orunicode
(Python 2). If False, then for character data the obtained data type will remain byte strings. If data is given and is unicode, then this setting will be ignored and unicode data type will be returned. If data_type is given and refers to bit-strings, then this setting will be ignored and a byte string data type will be returned. Defaults to False.- decode_dates: bool, optional
If True, then the returned data type will be a form of date/time when data_type is both given and is a form of date and/or time. If False, then the returned data type will be a form of character according to decode_strings. If data is given, then this setting will be ignored. Defaults to False.
- scaling_factorint, float or None, optional
PDS4 scaling factor. If given, the returned data type will be large enough to contain data scaled by this number. Defaults to None, indicating a value of 1.
- value_offsetint, float or None, optional
PDS4 value offset. If given, the returned data type will will be large enough to contain data offset by this number. Defaults to None, indicating a value of 0.
- Returns
- str, unicode, bytes, int, float, bool, complex
A builtin data type that can store the data described by the input parameters.
- pds_to_numpy_name(name)[source]¶
Create a NumPy field name from a PDS4 field name.
- Parameters
- namestr or unicode
A PDS4 field name.
- Returns
- str
A NumPy-compliant field name.
- apply_scaling_and_value_offset(data, scaling_factor=None, value_offset=None, special_constants=None)[source]¶
Applies scaling factor and value offset to data.
Data is modified in-place, if possible. Data type may change to prevent numerical overflow if applying scaling factor and value offset would cause one.
- Parameters
- dataarray_like
Any numeric PDS4 data.
- scaling_factorint, float or None, optional
PDS4 scaling factor to apply to the array. Defaults to None, indicating a value of 1.
- value_offsetint, float or None, optional
PDS4 value offset to apply to the array. Defaults to None, indicating a value of 0.
- special_constantsdict, optional
If provided, the keys correspond to names and values correspond to numeric values for special constants. Those particular values will not be scaled or offset.
- Returns
- np.ndarray or subclass
data with scaling_factor and value_offset applied, potentially with a new dtype if necessary to fit new values.
- adjust_array_data_type(array, scaling_factor=None, value_offset=None)[source]¶
Converts the input array into a new large enough data type if adjusting said array as-is by scaling_factor or value_offset would result in an overflow. This can be necessary both if the array is data from a PDS4 Array or a PDS4 Table, so long as it has a scaling factor or value offset associated with it.
- Parameters
- arrayarray_like
Any PDS4 numeric data.
- scaling_factorint, float or None, optional
PDS4 scaling factor to apply to the array. Defaults to None, indicating a value of 1.
- value_offsetint, float or None, optional
PDS4 value offset to apply to the array. Defaults to None, indicating a value of 0.
- Returns
- np.ndarray or subclass
Original array modified to have a new data type if necessary or unchanged if otherwise.
- get_scaled_numpy_type(data_type=None, data=None, scaling_factor=None, value_offset=None)[source]¶
Obtain the NumPy dtype that would be necessary to store PDS4 data once that data has been scaled.
When scaling data, the final data type is likely going to be different from the original data type it has. (E.g. if you multiply integers by a float, then the final data type will be float.) This method determines what that final data type will have to be when given the initial data type and the scaling and offset values.
- Parameters
- data_typestr, unicode or PDSdtype, optional
If given, specifies the initial PDS4 data type that the unscaled data has or would have.
- dataarray_like or None, optional
If given, an array of data. When given, the initial data type for the unscaled data will be taken from this array and data_type ignored. For some ASCII data types in PDS4, the exact necessary data type (scaled or unscaled) can only be obtained when the data is already known. If data is not given, a data type sufficient (but possibly larger than necessary) to store the data will be returned. Defaults to None.
- scaling_factorint, float, or None
PDS4 scaling factor that will later be applied to the data. Defaults to None, indicating a value of 1.
- value_offsetint, float, or None
PDS4 value offset that will later be applied to the data. Defaults to None, indicating a value of 0.
- Returns
- np.dtype
A NumPy dtype large enough to store the data if it has had scaling_factor and value_offset.
Notes
For masked data, the output type will be large enough to store the masked data values as if they had been scaled/offset. This is because NumPy documentation notes that masked data are not guaranteed to be unaffected by arithmetic operations, only that every attempt will be made to do so.
- decode_bytes_to_unicode(array)[source]¶
Decodes each byte string in the array into unicode.
- Parameters
- arrayarray_like
An array containing only byte strings (
str
in Python 2,bytes
in Python 3).
- Returns
- np.ndarray or subclass
An array in which each element of input array has been decoded to unicode.
- mask_special_constants(data, special_constants, mask_strings=False, copy=False)[source]¶
Mask out special constants in an array.
- Parameters
- dataarray_like
An array of data in which to mask out special constants.
- special_constantsdict
A dictionary, where keys are the names of the special constants, and the values will be masked out.
- mask_stringsbool, optional
If True, character data will also be masked out if it has special constants. If False, only numeric data will be masked out. Defaults to False.
- copybool, optional
If True, the returned masked data is a copy. If False, a view is returned instead. Defaults to False.
- Returns
- np.ma.MaskedArray, np.ndarray or subclass
If data to be masked is found, an
np.ma.MaskedArray
or subclass view (preserving input class if it was already a subclass of masked arrays). Otherwise the input data will be returned.
Notes
The match between special constant value and data value (to mask it out) in this method is simplistic. For numeric values, it is based on the NumPy implementation of equality. For string values, the match is done by trimming leading/trailing whitespaces in both data value and special constant, then comparing for exact equality. Currently the PDS4 Standard does not provide enough clarity on how Special_Constant matching should truly be done.
- get_min_integer_numpy_type(data)[source]¶
Obtain smallest integer NumPy dtype that can store every value in the input array.
- Parameters
- dataarray_like
PDS4 integer data.
- Returns
- np.dtype
The NumPy dtype that can store all integers in data.
- is_pds_integer_data(data=None, pds_data_type=None)[source]¶
Determine, from a data array or from a PDS4 data type, whether such data is an integer.
- Parameters
- dataarray_like, optional
If given, checks whether this data is integer data.
- pds_data_typestr, unicode or PDSdtype, optional
If given, checks whether this PDS data type corresponds to integer data.
- Returns
- bool
True if data and/or pds_data_type contain or correspond to PDS4 integer data, False otherwise.
Notes
This is necessary, as opposed to simply checking for dtype, because some PDS4 data is integer but may have the ‘object’ dtype because it may overflow 64-bit integers (e.g. ASCII_Numeric_Base data, which is not limited to 64-bit sizes by the PDS4 standard).