Skip to content

Serializing computations

Loman can serialize computations to a JSON file for later inspection or post-mortem debugging. This is useful when a scheduled job should capture its inputs, intermediates, and results so they can be examined if something goes wrong.

>>> import math
>>> from loman import Computation
>>> comp = Computation()
>>> comp.add_node('x', value=4.0)
>>> def area(x):
...     return math.pi * x ** 2
>>> comp.add_node('area', area)
>>> comp.compute_all()
>>> comp.to_dict()
{'x': 4.0, 'area': 50.26548245743669}

To save and reload the computation:

>>> comp.write_json('comp.json')
>>> comp2 = Computation.read_json('comp.json')
>>> comp2.v.area
50.26548245743669

The output is a plain JSON text file, so it is human-readable and can be inspected with any text editor.

Excluding nodes from serialization

Sometimes a node holds a value that should not (or cannot) be saved — for example a database connection, a licensed dataset, or an object that does not support JSON serialization. Pass serialize=False when adding the node:

>>> import sqlalchemy as sa
>>> comp = Computation()
>>> comp.add_node('engine', sa.create_engine('sqlite://'), serialize=False)
>>> comp.add_node('result', value=42)
>>> comp.write_json('comp.json')
>>> comp2 = Computation.read_json('comp.json')
>>> comp2.state('engine')
<States.UNINITIALIZED: 1>
>>> comp2.v.result
42

The excluded node is preserved in the file with UNINITIALIZED state and no value; all other nodes round-trip normally.

Lambdas are not serializable by default

A lambda cannot be serialized because it has no importable module path. Use a module-level function instead:

>>> from loman import Computation, ComputationSerializer, SerializationError
>>> comp = Computation()
>>> comp.add_node('a', value=1)
>>> comp.add_node('b', lambda a: a + 1)
>>> comp.compute_all()
>>> import io
>>> try:
...     comp.write_json(io.StringIO())
... except SerializationError as e:
...     print(e)
Cannot serialize lambda function on node NodeKey(parts=('b',)). Use a module-level importable function, serialize=False, or ComputationSerializer(use_dill_for_functions=True).

Replace the lambda with a named function defined at module level:

>>> def increment(a):
...     return a + 1
>>> comp.add_node('b', increment)
>>> comp.compute_all()
>>> comp.write_json('comp.json')       # now succeeds

Using dill to serialize lambdas and closures

When refactoring to named functions is impractical, pass use_dill_for_functions=True to ComputationSerializer. This serializes any callable — including lambdas and closures that capture local variables — as a base64-encoded dill blob inside the JSON:

>>> s = ComputationSerializer(use_dill_for_functions=True)
>>> comp = Computation()
>>> comp.add_node('a', value=3)
>>> comp.add_node('b', lambda a: a * 2)
>>> comp.compute_all()
>>> buf = io.StringIO()
>>> comp.write_json(buf, serializer=s)
>>> _ = buf.seek(0)
>>> comp2 = Computation.read_json(buf, serializer=s)
>>> comp2.v.b
6
>>> comp2.insert('a', 10)
>>> comp2.compute_all()
>>> comp2.v.b
20

The same serializer instance must be passed to both write_json and read_json.

Warning

The dill blob embedded in the JSON is not portable across Python versions and shares the same stability caveats as the deprecated write_dill. Prefer named functions when long-term compatibility matters.

File objects and strings

Both write_json and read_json accept either a file path (string) or any text-mode file-like object:

>>> import io
>>> buf = io.StringIO()
>>> comp.write_json(buf)
>>> _ = buf.seek(0)
>>> comp3 = Computation.read_json(buf)
>>> comp3.v.b
2

PINNED nodes

Pinned nodes round-trip correctly — their PINNED state and value are preserved:

>>> comp = Computation()
>>> comp.add_node('a', value=10)
>>> comp.pin('a')
>>> comp.write_json('comp.json')
>>> comp2 = Computation.read_json('comp.json')
>>> comp2.state('a')
<States.PINNED: 5>
>>> comp2.v.a
10

ERROR nodes

If a node is in ERROR state, its exception type, message, and traceback are preserved as strings so they can be read back for post-mortem inspection even without the original exception class:

>>> def bad_func():
...     raise ValueError("something went wrong")
>>> comp = Computation()
>>> comp.add_node('result', bad_func)
>>> comp.compute_all()
>>> comp.state('result')
<States.ERROR: 4>
>>> comp.write_json('comp.json')
>>> comp2 = Computation.read_json('comp.json')
>>> comp2.state('result')
<States.ERROR: 4>
>>> comp2['result'].value.exception
Exception('something went wrong')

Custom serialization for user-defined types

For types that are not handled by the default serializer, pass a custom ComputationSerializer instance with additional transformers registered:

>>> from loman import Computation, ComputationSerializer
>>> from loman.serialization import CustomTransformer, Transformer
>>> class Point:
...     def __init__(self, x, y):
...         self.x = x
...         self.y = y
>>> point_transformer = CustomTransformer(
...     Point,
...     to_dict=lambda v: {'__point__': True, 'x': v.x, 'y': v.y},
...     from_dict=lambda d: Point(d['x'], d['y']),
... )
>>> s = ComputationSerializer()
>>> s._t.register(point_transformer)
>>> comp = Computation()
>>> comp.add_node('origin', value=Point(0, 0))
>>> buf = io.StringIO()
>>> comp.write_json(buf, serializer=s)
>>> _ = buf.seek(0)
>>> comp2 = Computation.read_json(buf, serializer=s)
>>> comp2.v.origin.x
0

Pandas support

DataFrames and Series are serialized automatically:

>>> import pandas as pd
>>> comp = Computation()
>>> comp.add_node('df', value=pd.DataFrame({'a': [1, 2], 'b': [3, 4]}))
>>> buf = io.StringIO()
>>> comp.write_json(buf)
>>> _ = buf.seek(0)
>>> comp2 = Computation.read_json(buf)
>>> comp2.v.df.shape
(2, 2)

JSON format reference

The file is a single JSON object with three top-level keys:

{
  "version": 1,
  "nodes": [ ... ],
  "edges": [ ... ]
}

Node object

Each entry in nodes has:

Field Type Description
key string Node name. Hierarchical keys use / as separator.
state string | null States enum name: "UPTODATE", "STALE", "UNINITIALIZED", "ERROR", "PINNED", …
value any Encoded value (see below), or null when absent.
has_value bool true when value should be restored; false when the node has no value.
func object | null Encoded callable (see below), or null.
serialize bool Whether the node carries the __serialize__ tag.
tags list[string] Non-system user tags.

Edge object

Each entry in edges has:

Field Type Description
src string Source node key.
dst string Destination node key.
param_type "arg" | "kwd" | null How the value is passed to the function.
param int | string | null Positional index for "arg", parameter name for "kwd".

Value encoding

Plain Python scalars (int, float, str, bool, None) are stored as-is. Compound types use a tagged object with a "type" discriminator:

NumPy array

{
  "type": "ndarray",
  "shape": [3],
  "dtype": "<f8",
  "data": [1.0, 2.0, 3.0]
}

Pandas DataFrame (split orientation, column dtypes preserved)

{
  "type": "dataframe",
  "columns": ["x", "y"],
  "index": [0, 1],
  "data": [[1.0, 3.0], [2.0, 4.0]],
  "dtypes": {"x": "int64", "y": "float64"}
}

ERROR node value (exception preserved as strings for post-mortem)

{
  "__loman_error__": true,
  "exception_type": "ValueError",
  "exception_str": "something went wrong",
  "traceback": "Traceback (most recent call last):\n  ..."
}

Function encoding

Importable module-level function (default)

{
  "type": "func_ref",
  "module": "mypackage.calcs",
  "qualname": "compute_result"
}

Lambda or closure (only when use_dill_for_functions=True)

{
  "type": "dill_func",
  "blob": "gASVyQAAAAAAAACMCmRpbGwuX2RpbGyU..."
}

The blob field is a base64-encoded dill byte string. It is not portable across Python versions.

Note

The JSON serialization format is not intended for long-term storage. It is designed for short-term inspection and post-mortem debugging. The format may change between releases.