Data management

dsch data representation.

In dsch, data is structured according to a given schema. The data is then represented as a hierarchical structure of data nodes, each of which corresponds to a node in the schema. This allows subsequent validation against the schema.

The data nodes are also responsible for storing the data. Since dsch is built to support multiple storage backends, there are specific data node classes implementing the respective functionality. The classes in this module provide common functionality and are intended to be used as base classes. Different backends are implemented in the dsch.backends package.

class dsch.data.Array(schema_node, parent, data_storage=None, new_params=None)

Generic Array data node.

This class implements backend-independent behaviour of Array data nodes. Backend-specific subclasses should derive from this class.

node_tree()

Return a recursive representation of the (sub)node-tree.

The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.

For Array nodes, the value is not included in the label because of its length. Instead, the array shape is shown. If no value is set, ‘<empty>’ is printed instead.

Returns:{label: sub_tree} representation.
Return type:dict
resize(size)

Resize the array to the desired size.

Parameters:size (tuple) – Desired array size.
validate()

Validate the node value against the schema node specification.

If validation succeeds, the method terminates silently. Otherwise, an exception is raised.

Raises:dsch.exceptions.ValidationError – if validation fails.
class dsch.data.Compilation(schema_node, parent, data_storage=None, new_params=None)

Compilation data node.

Compilation is the base class for compilation-type data nodes, providing common functionality and the common interface. Subclasses may add functionality depending on the backend.

Variables:
  • schema_node – The schema node that this data node is based on.
  • parent – Parent data node object (None if this is the top-level data node).
  • complete – Data completeness flag. True if all required data is present.
  • empty – Data absence flag. True if no data is present.
clear()

Clear all sub-node values.

Note that, in contrast to List, this does not remove the the sub-nodes entirely, but only their values (by calling the respective clear() method). This is because the set of sub-nodes for a Compilation is fixed via the schema specification and does not change during usage.

complete

Check whether the Compilation is currently complete.

A Compilation is considered complete when all non-optional sub-nodes are individually complete. This allows defining exceptions for specific sub-nodes by including them in schema.Compilation.optionals.

Note

complete is not simply the inverse of empty, since it is only True when all non-optional fields are filled. This means a Compilation can be non-empty and non-complete at the same time.

Returns:True if the Compilation is complete, False otherwise.
Return type:bool
empty

Check whether the Compilation is currently empty.

A Compilation is considered empty when all individual sub-nodes are empty.

Returns:True if the Compilation is empty, False otherwise.
Return type:bool
load_from(source_node)

Load data by copying from the given source node.

For Compilations, this copies the relevant subnode’s data recursively.

Parameters:source_node – Data node to copy value from.
node_tree()

Return a recursive representation of the (sub)node-tree.

The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.

For Compilation nodes, all sub-node’s representations are printed recursively, prefixed by the sub-node name.

Returns:{label: sub_tree} representation.
Return type:dict
replace(new_value)

Replace the current compilation values with new ones.

The new values must be specified as a dict, where the key corresponds to the compilation field name.

For Compilation, this method is effectively a shorthand for calling ItemNode.replace() on all fields specified in the given dict.

Parameters:new_value (dict) – Mapping of field names to new values.
validate()

Recursively validate all sub-node values.

If validation succeeds, the method terminates silently. Otherwise, an exception is raised.

Raises:dsch.exceptions.SubnodeValidationError – if validation fails.
class dsch.data.Date(schema_node, parent, data_storage=None, new_params=None)

Generic Date data node.

This class implements backend-independent behaviour of Date data nodes. Backend-specific subclasses should derive from this class.

class dsch.data.DateTime(schema_node, parent, data_storage=None, new_params=None)

Generic DateTime data node.

This class implements backend-independent behaviour of DateTime data nodes. Backend-specific subclasses should derive from this class.

class dsch.data.ItemNode(schema_node, parent, data_storage=None, new_params=None)

Generic data item node.

ItemNode is the base class for data nodes, providing common functionality and the common interface. Subclasses may add functionality depending on the node type and backend (e.g. compression settings).

Note that this is only the base class for item nodes, i.e. nodes that directly hold data. Collection nodes, i.e. Compilation and List are not based on this class.

Variables:
  • schema_node – The schema node that this data node is based on.
  • parent – Parent data node object (None if this is the top-level data node).
  • complete – Data completeness flag. True if data is present.
  • empty – Data absence flag. True if no data is present.
  • value – Actual node data, independent of the backend in use.
clear()

Clear the data that is held by this data node.

This removes the corresponding storage object entirely, causing the data node to be empty afterwards.

complete

Check whether the data node is currently complete.

A data node is considered complete when a corresponding storage object exists. For non-containing nodes (i.e. all node types except Compilation and List), this is always the inverse of empty, but the property is still provided for interface compatibility.

Returns:True if the data node is complete, False otherwise.
Return type:bool
empty

Check whether the data node is currently empty.

A data node is considered empty when no corresponding storage object exists. For applying a new value, set value.

Returns:True if the data node is empty, False otherwise.
Return type:bool
load_from(source_node)

Load data by copying from the given source node.

This is effectively a shorthand for self.replace(source_node.value) with additional checking of node compatibility. Two nodes are considered compatible if their schema_node attributes are identical.

Parameters:source_node – Data node to copy value from.
node_tree()

Return a recursive representation of the (sub)node-tree.

The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.

For leaf nodes, i.e. nodes that do not contain other nodes, the str()-representation of the value is also included in the label. If no value is set, ‘<empty>’ is printed instead.

Returns:{label: sub_tree} representation.
Return type:dict
replace(new_value)

Completely replace the current node value.

Instead of changing parts of the data (e.g. via numpy array slicing), replace the entire data object for this node.

Parameters:new_value – New value to apply to the node, independent of the backend in use.
validate()

Validate the node value against the schema node specification.

If validation succeeds, the method terminates silently. Otherwise, an exception is raised.

Raises:dsch.exceptions.ValidationError – if validation fails.
value

Return the actual node data, independent of the backend in use.

This representation of the data only depends on the corresponding node type, not on the selected storage backend.

If the node is currently empty, the value is undefined and NodeEmptyError is raised.

Returns:Node data.
Raises:dsch.exceptions.NodeEmptyError – if the node is currently empty.
class dsch.data.List(schema_node, parent, data_storage=None, new_params=None)

List-type data node.

List is the base class for list-type data nodes, providing common functionality and the common interface. Subclasses may add functionality depending on the backend.

Variables:
  • schema_node – The schema node that this data node is based on.
  • parent – Parent data node object (None if this is the top-level data node).
  • complete – Data completeness flag. True if all list items are complete.
  • empty – Data absence flag. True if no data is present.
append(value=None)

Append a new data node to the list.

If a value is given, it is automatically applied to the new data node. Otherwise, an empty data node is created, which can be useful especially for Lists of Compilations.

Parameters:value – Value to be added to the list.
clear()

Clear all sub-nodes.

This removes all sub-nodes entirely, yielding an empty List.

complete

Check whether the List is currently complete.

A List is considered complete when all of its sub-nodes are complete.

Warning

An empty List is considered complete! If a minimum number of list items is required, use schema.List.min_length to apply the corresponding constraint.

Returns:True if the List is complete, False otherwise.
Return type:bool
empty

Check whether the List is currently empty.

A List is considered empty when all of its sub-nodes are empty. As a special case, it is also considered empty when there are no sub-nodes present.

Returns:True if the List is empty, False otherwise.
Return type:bool
load_from(source_node)

Load data by copying from the given source node.

For Lists, this copies the relevant subnode’s data recursively.

Parameters:source_node – Data node to copy value from.
node_tree()

Return a recursive representation of the (sub)node-tree.

The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.

For lists, all sub-node’s representations are printed recursively, prefixed by the list index in brackets.

Returns:{label: sub_tree} representation.
Return type:dict
replace(new_value)

Replace the current list entries with the given list of entries.

For List, this is effectively a shorthand for calling clear() and then, for each of the new entries, append().

Parameters:new_value (list) – New entries to put into the List.
validate()

Recursively validate all sub-node values.

If validation succeeds, the method terminates silently. Otherwise, an exception is raised.

Raises:dsch.exceptions.SubnodeValidationError – if validation fails.
class dsch.data.Scalar(schema_node, parent, data_storage=None, new_params=None)

Generic Scalar data node.

This class implements backend-independent behaviour of Scalar data nodes. Backend-specific subclasses should derive from this class.

node_tree()

Return a recursive representation of the (sub)node-tree.

The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.

For Scalar nodes, the unit is appended to the value, if any. If no value is set, ‘<empty>’ is printed instead.

Returns:{label: sub_tree} representation.
Return type:dict
class dsch.data.Time(schema_node, parent, data_storage=None, new_params=None)

Generic Time data node.

This class implements backend-independent behaviour of Time data nodes. Backend-specific subclasses should derive from this class.

dsch.data.data_node_from_schema(schema_node, module_name, parent, data_storage=None, new_params=None)

Create a new data node from a given schema node.

Finds the data node class corresponding to the given schema node and creates an instance. However, the module containing the data node class must be given, which allows to select the desired storage backend.

If data_storage is given, the new data node is initialized from that storage object. Otherwise, a new data node with a new storage object is created. Backends may use a new_params object to supply parameters for new data node creation.

Parameters:
  • schema_node – Schema node instance to create a data node for.
  • module_name (str) – The full module name of the data storage backend.
  • parent – Parent data node object.
  • data_storage – Backend-specific data storage object to load.
  • new_params – Backend-specific metadata for data node creation.
Returns:

Data node corresponding to the given schema node.