Data management¶
dsch data representation.
In dsch, data is structured according to a given schema. The data is then represented as a hierarchical structure of data nodes, each of which corresponds to a node in the schema. This allows subsequent validation against the schema.
The data nodes are also responsible for storing the data. Since dsch is built
to support multiple storage backends, there are specific data node classes
implementing the respective functionality. The classes in this module provide
common functionality and are intended to be used as base classes.
Different backends are implemented in the dsch.backends package.
-
class
dsch.data.Array(schema_node, parent, data_storage=None, new_params=None)¶ Generic Array data node.
This class implements backend-independent behaviour of Array data nodes. Backend-specific subclasses should derive from this class.
-
node_tree()¶ Return a recursive representation of the (sub)node-tree.
The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.
For Array nodes, the value is not included in the label because of its length. Instead, the array shape is shown. If no value is set, ‘<empty>’ is printed instead.
Returns: {label: sub_tree} representation. Return type: dict
-
validate()¶ Validate the node value against the schema node specification.
If validation succeeds, the method terminates silently. Otherwise, an exception is raised.
Raises: dsch.exceptions.ValidationError– if validation fails.
-
-
class
dsch.data.Compilation(schema_node, parent, data_storage=None, new_params=None)¶ Compilation data node.
Compilationis the base class for compilation-type data nodes, providing common functionality and the common interface. Subclasses may add functionality depending on the backend.Variables: -
clear()¶ Clear all sub-node values.
Note that, in contrast to
List, this does not remove the the sub-nodes entirely, but only their values (by calling the respectiveclear()method). This is because the set of sub-nodes for a Compilation is fixed via the schema specification and does not change during usage.
-
complete¶ Check whether the Compilation is currently complete.
A Compilation is considered complete when all non-optional sub-nodes are individually complete. This allows defining exceptions for specific sub-nodes by including them in
schema.Compilation.optionals.Note
completeis not simply the inverse ofempty, since it is onlyTruewhen all non-optional fields are filled. This means a Compilation can be non-empty and non-complete at the same time.Returns: Trueif the Compilation is complete,Falseotherwise.Return type: bool
-
empty¶ Check whether the Compilation is currently empty.
A Compilation is considered empty when all individual sub-nodes are empty.
Returns: Trueif the Compilation is empty,Falseotherwise.Return type: bool
-
load_from(source_node)¶ Load data by copying from the given source node.
For Compilations, this copies the relevant subnode’s data recursively.
Parameters: source_node – Data node to copy value from.
-
node_tree()¶ Return a recursive representation of the (sub)node-tree.
The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.
For Compilation nodes, all sub-node’s representations are printed recursively, prefixed by the sub-node name.
Returns: {label: sub_tree} representation. Return type: dict
-
replace(new_value)¶ Replace the current compilation values with new ones.
The new values must be specified as a
dict, where the key corresponds to the compilation field name.For
Compilation, this method is effectively a shorthand for callingItemNode.replace()on all fields specified in the given dict.Parameters: new_value (dict) – Mapping of field names to new values.
-
validate()¶ Recursively validate all sub-node values.
If validation succeeds, the method terminates silently. Otherwise, an exception is raised.
Raises: dsch.exceptions.SubnodeValidationError– if validation fails.
-
-
class
dsch.data.Date(schema_node, parent, data_storage=None, new_params=None)¶ Generic Date data node.
This class implements backend-independent behaviour of Date data nodes. Backend-specific subclasses should derive from this class.
-
class
dsch.data.DateTime(schema_node, parent, data_storage=None, new_params=None)¶ Generic DateTime data node.
This class implements backend-independent behaviour of DateTime data nodes. Backend-specific subclasses should derive from this class.
-
class
dsch.data.ItemNode(schema_node, parent, data_storage=None, new_params=None)¶ Generic data item node.
ItemNodeis the base class for data nodes, providing common functionality and the common interface. Subclasses may add functionality depending on the node type and backend (e.g. compression settings).Note that this is only the base class for item nodes, i.e. nodes that directly hold data. Collection nodes, i.e.
CompilationandListare not based on this class.Variables: - schema_node – The schema node that this data node is based on.
- parent – Parent data node object (
Noneif this is the top-level data node). - complete – Data completeness flag.
Trueif data is present. - empty – Data absence flag.
Trueif no data is present. - value – Actual node data, independent of the backend in use.
-
clear()¶ Clear the data that is held by this data node.
This removes the corresponding storage object entirely, causing the data node to be
emptyafterwards.
-
complete¶ Check whether the data node is currently complete.
A data node is considered complete when a corresponding storage object exists. For non-containing nodes (i.e. all node types except
CompilationandList), this is always the inverse ofempty, but the property is still provided for interface compatibility.Returns: Trueif the data node is complete,Falseotherwise.Return type: bool
-
empty¶ Check whether the data node is currently empty.
A data node is considered empty when no corresponding storage object exists. For applying a new value, set
value.Returns: Trueif the data node is empty,Falseotherwise.Return type: bool
-
load_from(source_node)¶ Load data by copying from the given source node.
This is effectively a shorthand for
self.replace(source_node.value)with additional checking of node compatibility. Two nodes are considered compatible if theirschema_nodeattributes are identical.Parameters: source_node – Data node to copy value from.
-
node_tree()¶ Return a recursive representation of the (sub)node-tree.
The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.
For leaf nodes, i.e. nodes that do not contain other nodes, the
str()-representation of the value is also included in the label. If no value is set, ‘<empty>’ is printed instead.Returns: {label: sub_tree} representation. Return type: dict
-
replace(new_value)¶ Completely replace the current node value.
Instead of changing parts of the data (e.g. via numpy array slicing), replace the entire data object for this node.
Parameters: new_value – New value to apply to the node, independent of the backend in use.
-
validate()¶ Validate the node value against the schema node specification.
If validation succeeds, the method terminates silently. Otherwise, an exception is raised.
Raises: dsch.exceptions.ValidationError– if validation fails.
-
value¶ Return the actual node data, independent of the backend in use.
This representation of the data only depends on the corresponding node type, not on the selected storage backend.
If the node is currently empty, the value is undefined and
NodeEmptyErroris raised.Returns: Node data. Raises: dsch.exceptions.NodeEmptyError– if the node is currently empty.
-
class
dsch.data.List(schema_node, parent, data_storage=None, new_params=None)¶ List-type data node.
Listis the base class for list-type data nodes, providing common functionality and the common interface. Subclasses may add functionality depending on the backend.Variables: -
append(value=None)¶ Append a new data node to the list.
If a
valueis given, it is automatically applied to the new data node. Otherwise, an empty data node is created, which can be useful especially for Lists of Compilations.Parameters: value – Value to be added to the list.
-
clear()¶ Clear all sub-nodes.
This removes all sub-nodes entirely, yielding an empty List.
-
complete¶ Check whether the List is currently complete.
A List is considered complete when all of its sub-nodes are complete.
Warning
An empty List is considered complete! If a minimum number of list items is required, use
schema.List.min_lengthto apply the corresponding constraint.Returns: Trueif the List is complete,Falseotherwise.Return type: bool
-
empty¶ Check whether the List is currently empty.
A List is considered empty when all of its sub-nodes are empty. As a special case, it is also considered empty when there are no sub-nodes present.
Returns: Trueif the List is empty,Falseotherwise.Return type: bool
-
load_from(source_node)¶ Load data by copying from the given source node.
For Lists, this copies the relevant subnode’s data recursively.
Parameters: source_node – Data node to copy value from.
-
node_tree()¶ Return a recursive representation of the (sub)node-tree.
The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.
For lists, all sub-node’s representations are printed recursively, prefixed by the list index in brackets.
Returns: {label: sub_tree} representation. Return type: dict
-
replace(new_value)¶ Replace the current list entries with the given list of entries.
For
List, this is effectively a shorthand for callingclear()and then, for each of the new entries,append().Parameters: new_value (list) – New entries to put into the List.
-
validate()¶ Recursively validate all sub-node values.
If validation succeeds, the method terminates silently. Otherwise, an exception is raised.
Raises: dsch.exceptions.SubnodeValidationError– if validation fails.
-
-
class
dsch.data.Scalar(schema_node, parent, data_storage=None, new_params=None)¶ Generic Scalar data node.
This class implements backend-independent behaviour of Scalar data nodes. Backend-specific subclasses should derive from this class.
-
node_tree()¶ Return a recursive representation of the (sub)node-tree.
The representation is a dict with the node’s own label as the key and the tree of sub-nodes as the value. The label always starts with the node type in parentheses.
For Scalar nodes, the
unitis appended to the value, if any. If no value is set, ‘<empty>’ is printed instead.Returns: {label: sub_tree} representation. Return type: dict
-
-
class
dsch.data.Time(schema_node, parent, data_storage=None, new_params=None)¶ Generic Time data node.
This class implements backend-independent behaviour of Time data nodes. Backend-specific subclasses should derive from this class.
-
dsch.data.data_node_from_schema(schema_node, module_name, parent, data_storage=None, new_params=None)¶ Create a new data node from a given schema node.
Finds the data node class corresponding to the given schema node and creates an instance. However, the module containing the data node class must be given, which allows to select the desired storage backend.
If
data_storageis given, the new data node is initialized from that storage object. Otherwise, a new data node with a new storage object is created. Backends may use anew_paramsobject to supply parameters for new data node creation.Parameters: - schema_node – Schema node instance to create a data node for.
- module_name (str) – The full module name of the data storage backend.
- parent – Parent data node object.
- data_storage – Backend-specific data storage object to load.
- new_params – Backend-specific metadata for data node creation.
Returns: Data node corresponding to the given schema node.