ahds
¶
Table of Contents
Overview¶
ahds
is a Python package to parse and handle Amira (R) files.
It was developed to facilitate reading of Amira (R) files as part of the EMDB-SFF toolkit.
Note
Amira (R) is a trademark of Thermo Fisher Scientific. This package is in no way affiliated with with Thermo Fisher Scientific.
Installation¶
Presently, ahds
only works with Python 2.7 but will soon work on Python 3. Please begin by
installing numpy<1.16
using
pip install numpy<1.16
because it is needed to run setup.py
. Afterwards you may run
pip install ahds
License¶
Copyright 2017 EMBL - European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied. See the License for the specific
language governing permissions and limitations under the License.
Future Plans¶
- Write out valid Amira (R) files
Background and Definitions¶
ahds
presently handles two types of Amira (R) files:
- AmiraMesh files, which typically but not necessarily have a
.am
extension, and - HyperSurface files, which have
.surf
and represent an older filetype.
Both file types consist of two parts:
- a header, and
- one or more data streams.
Headers are structured in a modified VRML-like syntax and differ between AmiraMesh and HyperSurface files in some of the keywords used.
A data stream is a sequence of encoded bytes either referred to in the header by some delimiter (usually @<data_stream_index>
, where <data_stream_index>
is an integer) or a set of structural keywords (e.g. Vertices
, Patches
) expected in a predefined sequence.
Headers in Detail¶
AmiraMesh and HyperSurface headers can be divided into four main sections:
- designation
- definitions
- parameters, and
- data pointers.
The designation is the first line and conveys several important details about the format and structure of the file such as:
- filetype (either
AmiraMesh
orHyperSurface
) - dimensionality (
3D
) - format (
BINARY-LITTLE-ENDIAN
,BINARY
orASCII
) - version (a decimal number e.g.
2.1
- extra format data e.g.
<hxsurface>
specifying that an AmiraMesh file will contain HyperSurface data
A series of definitions follow that refer to data found in the data pointer sections that either begin with the word ‘define’ or have ‘n’ prepended to a variable. For example:
define Lattice 862 971 200
or
nVertices 85120
This is followed by grouped parameters enclosed in a series of braces beginning with the word ‘Parameters’. Various parameters are then enclosed each beginning with the name of that group of parameters e.g. ‘Materials’
Parameters {
# grouped parameters
Material {
# the names of various materials with attributes
Exterior {
id 0
}
Inside {
id 1,
Color 0 1 1,
Transparency 0.5
}
}
Patches {
# patch attributes
InnerRegion “Insideâ€,
OuterRegion “Exteriorâ€,
BoundaryID 0,
BranchingPoints 0
}
# inline parameters
GridSize <value>,
…
}
The most important set of parameters are materials as these specify colours and identities of distinct segments/datasets within the file.
Finally, AmiraMesh files list a set of data pointers that point to data labels within the file together with additional information to decode the data. We refer to these as data streams because they consist of continuous streams of raw byte data that need to be decoded. Here is an example of data pointers that refer to the location of 3D surface primitives:
Vertices { float[3] Vertices } @1
TriangleData { int[7] Triangles } @2
Patches-0 { int Patches-0 } @3
These refer to three raw data streams each found beginning with the delimiter @<number>
. Data stream one (@1
) is called Vertices
and consists of float triples, two is called TriangleData
and has integer 7-tuples and three called Patches-
is a single integer (the number of patches). In some cases the data pointer contains the data encoding for the corresponding data pointer.
Lattice { byte Labels } @1(HxByteRLE,234575740)
which is a run-length encoded data stream of the specified length, while
Lattice { byte Data } @1(HxZip,919215)
contains zipped data of the specified length.
Data Streams in Detail¶
AmiraMesh data streams are very simple. They always have a start delimiter made of @
with an index that identifies the data stream. A newline character separates the delimiter with the data stream proper which is either plain ASCII or a binary stream (raw, zipped or encoded).
HyperSurface data streams structured to have the following sections:
# Header
Vertices <nvertices>
# vertices data stream
NBranchingPoints <nbranching_points>
NVerticesOnCurves <nvertices_on_curves>
BoundaryCurves <nboundary_curves>
Patches <npatches>
{
InnerRegion <inner_region_name>
OuterRegion <outer_region_name>
BoundaryID <boundary_id>
BranchingPoints <nbranching_points>
Triangles <ntriangles>
# triangles data stream
} # repeats for as <npatches> times
HyperSurface data streams can be either plain ASCII or binary.
ahds
Modules¶
ahds
has three main modules:
ahds.grammar
specifies an EBNF grammarahds.header
ahds.data_stream
These modules are tied into a user-level class called ahds.AmiraFile
that does all the work for you.
>>> from ahds import AmiraFile
>>> # read an AmiraMesh file
>>> af = AmiraFile('am/test7.am')
>>> af.header
<AmiraHeader with 4 bytes>
>>> # empty data streams
>>> af.data_streams
>>> print af.data_streams
None
>>> # we have to explicitly read to get the data streams
>>> af.read()
>>> af.data_streams
<class 'ahds.data_stream.DataStreams'> object with 13 stream(s): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
>>> for ds in af.data_streams:
... print ds
...
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 2,608 bytes
# we get the n-th data stream using the index/key notation
>>> af.data_streams[1].encoded_data
'1 \n2 \n3 \n'
>>> af.data_streams[1].decoded_data
[1, 2, 3]
>>> af.data_streams[2].encoded_data
'69 \n120 \n116 \n101 \n114 \n105 \n111 \n114 \n0 \n73 \n110 \n115 \n105 \n100 \n101 \n0 \n109 \n111 \n108 \n101 \n99 \n117 \n108 \n101 \n0 \n'
>>> af.data_streams[2].decoded_data
[69, 120, 116, 101, 114, 105, 111, 114, 0, 73, 110, 115, 105, 100, 101, 0, 109, 111, 108, 101, 99, 117, 108, 101, 0]
>>> # read an HyperSurface file
>>> af = AmiraFile('surf/test4.surf')
>>> af.read()
>>> af.data_streams
<class 'ahds.data_stream.DataStreams'> object with 5 stream(s): Patches, NBranchingPoints, BoundaryCurves, Vertices, NVerticesOnCurves
# HyperSurface files have pre-set data streams
>>> af.data_streams['Vertices'].decoded_data[:10]
[(560.0, 243.0, 60.96875), (560.0, 242.9166717529297, 61.0), (559.5, 243.0, 61.0), (561.0, 243.0, 60.95833206176758), (561.0, 242.5, 61.0), (561.0384521484375, 243.0, 61.0), (559.0, 244.0, 60.94444274902344), (559.0, 243.5, 61.0), (558.9722290039062, 244.0, 61.0), (560.0, 244.0, 60.459999084472656)]
ahds.grammar
¶
This module describes the header grammar for Amira (R) (AmiraMesh and HyperSurface) files and so depends on simpleparse
Python package. It defines a single class (ahds.grammar.AmiraDispatchProcessor
) and four functions.
ahds.grammar.AmiraDispatchProcessor
is a subclass of simpleparse.dispatchprocessor
which implements the core functionality required to use the grammar. Each grammar token has a corresponding method defined on this class which determines how the data associated with that token will be rendered. Data can be rendered as a single or multimap, string, number, or in custom format.
ahds.grammar.get_parsed_data(fn, *args, **kwargs)()
is the user-level function that takes a filename and returns structured parsed data. It depends on the other three functions defined:ahds.grammar.detect_format(fn, format_bytes=50, verbose=False)()
returns eitherAmiraMesh
orHyperSurface
given a file name and arguments,ahds.grammar.get_header(fn, file_format, header_bytes=20000, verbose=False)()
returns the header portion based on the file format determined by detect_format(…), andahds.grammar.parse_header(data, verbose=False)()
converts the raw header data returned byahds.grammar.get_header(...)()
into a structured header based on AmiraDispatchProcessor.
ahds.header
¶
This module converts the structured header from the ahds.grammar
module into an object with the sections of the header (designation
, definitions
, parameters ``and ``data pointers
) and corresponding structured data available as attributes. That is, it converts the header:
# AmiraMesh BINARY-LITTLE-ENDIAN 2.1
define Lattice 862 971 200
Parameters {
Materials {
Exterior {
Id 1
}
Inside {
Color 0.64 0 0.8,
Id 2
}
Mitochondria {
Id 3,
Color 0 1 0
}
Mitochondria_ {
Id 4,
Color 1 1 0
}
mitochondria__ {
Id 5,
Color 0 0.125 1
}
NE {
Id 6,
Color 1 0 0
}
}
Content "862x971x200 byte, uniform coordinates",
BoundingBox 0 13410.7 0 15108.4 1121.45 4221.01,
CoordType "uniform"
}
Lattice { byte Labels } @1(HxByteRLE,4014522)
into an ahds.header.AmiraHeader
object.
>>> from ahds.header import AmiraHeader
>>> amira_header = AmiraHeader.from_file('am/test2.am')
>>> amira_header.designation.attrs
['filetype', 'dimension', 'format', 'version', 'extra_format']
>>> amira_header.designation.filetype
'AmiraMesh'
>>> amira_header.designation.dimension
>>> amira_header.designation.format
'BINARY-LITTLE-ENDIAN'
>>> amira_header.definitions.attrs
['Lattice']
>>> amira_header.definitions.Lattice
[862, 971, 200]
>>> amira_header.parameters.attrs
['Materials', 'Content', 'BoundingBox', 'CoordType']
>>> amira_header.parameters.Materials.attrs
['Exterior', 'Inside', 'Mitochondria', 'Mitochondria_', 'mitochondria__', 'NE']
>>> amira_header.parameters.Materials.Exterior.attrs
['Id']
>>> amira_header.parameters.Materials.Exterior.Id
1
>>> amira_header.parameters.Content
'"862x971x200 byte, uniform coordinates",'
>>> amira_header.parameters.BoundingBox
[0, 13410.7, 0, 15108.4, 1121.45, 4221.01]
>>> amira_header.parameters.CoordType
'"uniform"'
>>> amira_header.data_pointers.attrs
['data_pointer_1']
>>> amira_header.data_pointers.data_pointer_1.attrs
['pointer_name', 'data_format', 'data_dimension', 'data_type', 'data_name', 'data_index', 'data_length']
>>> amira_header.data_pointers.data_pointer_1.pointer_name
'Lattice'
>>> amira_header.data_pointers.data_pointer_1.data_format
'HxByteRLE'
>>> amira_header.data_pointers.data_pointer_1.data_dimension
>>> amira_header.data_pointers.data_pointer_1.data_type
'byte'
>>> amira_header.data_pointers.data_pointer_1.data_name
'Labels'
>>> amira_header.data_pointers.data_pointer_1.data_index
1
>>> amira_header.data_pointers.data_pointer_1.data_length
4014522
This module consists of two main classes: ahds.header.AmiraHeader
is the user-level class and ahds.header.Block
which is a container class for a block of structured data from an Amira (R) header.
AmiraHeader has one constructor: ahds.header.AmiraHeader.from_file(fn, *args, **kwargs)()
which takes an Amira (R) file by name and arguments and returns an ahds.header.AmiraHeader
object with all attributes set as described above. Alternatively, one can use the initiator form to pass structured data directly: ahds.header.AmiraHeader(parsed_data)
which returns an ahds.header.AmiraHeader
object configured appropriately.
- The raw data structured data is available as read-only property:
ahds.header.AmiraHeader.raw_header
- Internally the
ahds.header.AmiraHeader
class implements a set of private methods which individually load the four data sections (designation
,definitions
,parameters
, anddata pointers
).
The ahds.header.Block
class is a container class which converts structured groups to attributes and has two main attributes:
ahds.header.Block.name
provides the name of the current block
>>> amira_header.designation.name
'designation'
>>> amira_header.parameters.Materials.name
'Materials'
>>> amira_header.parameters.Materials.Exterior.name
'Exterior'
ahds.header.Block.attrs
provides the attributes available on thisahds.header.Block
>>> amira_header.designation.attrs
['filetype', 'dimension', 'format', 'version', 'extra_format']
>>> amira_header.designation.format
'BINARY-LITTLE-ENDIAN'
A given Materials block has two special features:
Block.ids returns the list of ids for all materials. This is important when decoding HxByteRLE compressed data
Block[id] returns the material for the given id using index notation.
>>> amira_header.parameters.Materials.ids
[1, 2, 3, 4, 5, 6]
>>> amira_header.parameters.attrs
['Materials', 'Content', 'BoundingBox', 'CoordType']
# ids attribute is only available for ‘Material’ blocks within ‘parameters’ section
>>> amira_header.parameters.Content.ids
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'ids'
# we can get the name of a material of the given id
>>> amira_header.parameters.Materials[4].name
'Mitochondria_'
ahds.data_stream
¶
This is most complex module implementing a hierarchy of classes describing various data streams within Amira (R) files. It has 22 classes and five functions
Classes¶
There are three categories of classes:
- A user-level class that encapsulates (2) below.
- Classes describing Amira (R) data streams
- Classes describing AmiraMesh data streams
- Classes describing HyperSurface data streams
- Data conversion classes (AmiraMesh only)
- Classes abstracting images
- Classes abstracting contours
The user-level ahds.data_stream.DataStreams
class is the preferred way to use the module. It takes the name of an Amira (R) file and encapsulates an iterator of data streams.
>>> from ahds import data_stream
>>> data_streams = data_stream.DataStreams('am/test6.am')
>>> data_streams
<class 'ahds.data_stream.DataStreams'> object with 2 stream(s): 1, 2
>>> for ds in data_streams:
... print ds
...
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 968,909 bytes
<class 'ahds.data_stream.AmiraMeshDataStream'> object of 968,909 bytes
Functions¶
The functions implemented in this module decode data streams.
ahds.data_stream.hxbyterle_decode()
decodesHxByteRLE
data streamsahds.data_stream.hxzip_decode(data_size, data)()
unzips zlib-compressed data streamsahds.data_stream.unpack_binary(data_pointer, definitions, data)()
unpacks the structured data stream according to the attributes specified in the data’s data pointerahds.data_stream.unpack_ascii(data)()
converts rows of ASCII data into numerical data
Classes in Detail¶
DataStreams class¶
The following attributes are available on objects of this class:
ahds.data_stream.DataStreams.file
- filename of Amira (R) fileahds.data_stream.DataStreams.header
- an object of classahds.header.AmiraHeader
encapsulating the header data in four sections (designation
,definitions
,parameters
, anddata pointers
)ahds.data_stream.DataStreams.filetype
- the filetype as specified in (ii) above.ahds.data_stream.DataStreams.stream_data
- all raw data from the file (including the header)len(DataStreams)
- the number of data streams containedahds.data_stream.DataStreams[<index>]
- returns the data stream of the index specified (as defined in the data_pointers section of the header object
Classes describing Amira (R) data streams¶
The following diagrams illustrates the hierarchy of classes:
Classes describing Amira (R) data streams
ahds.data_stream.AmiraDataStream
is the base class for all data stream classes and defines the following attributes:
ahds.data_stream.AmiraDataStream.header
- anahds.header.AmiraHeader
objectahds.data_stream.AmiraDataStream.data_pointer
- theahds.header.AmiraHeader.data_pointers.data_pointer_X
for this data streamahds.data_stream.AmiraDataStream.stream_data
- the raw file dataahds.data_stream.AmiraDataStream.encoded_data
- the encoded data for this stream;None
forVoidDataStream
subclassesahds.data_stream.AmiraDataStream.decoded_data
- the decoded data for this stream;None
forVoidDataStream
subclassesahds.data_stream.AmiraDataStream.decoded_length
- the number of items (tuples, integers) in decoded data
The two main subclasses of ahds.data_stream.AmiraDataStream
are ahds.data_stream.AmiraMeshDataStream
, which is a concrete class representing all AmiraMesh data streams, and ahds.data_stream.AmiraHxSurfaceDataStream
, which abstractly defines HyperSurface data streams.
There are two main AmiraHxSurfaceDataStream subclasses:
ahds.data_stream.VoidDataStream
representsahds.data_stream.AmiraHxSurfaceDataStream
data streams that only have a name and value but no actual encoded data (on the following line). There are two subclasses:
ahds.data_stream.NamedDataStream
subclasses have a strings after data stream name. The two concrete subclasses are:
ahds.data_stream.PatchesInnerRegionDataStream
for the name of an inner region of a patch (seePatchesDataStream
), andahds.data_stream.PatchesOuterRegionDataStream
for corresponding name of the outer region of a patch.
ahds.data_stream.ValuedDataStream
have an integer value after the data stream name. The three concrete subclasses are:
ahds.data_stream.PatchesBoundaryIDDataStream
hold the boundary ID of a patch,ahds.data_stream.PatchesBranchingPointsDataStream
stores the number of branching points, andahds.data_stream.PatchesDataStream
with the number of patches, which is a specialahds.data_stream.ValueDataStream
that contains an iterable of patches each containing aPatches<X>DataStream
objects.
ahds.data_stream.LoadedDataStream
representahds.data_stream.AmiraHxSurfaceDataStream
data streams that have a name, a value and encoded data. The two main concrete subclasses are:
ahds.data_stream.VerticesDataStream
represents data streams with float-triples, andahds.data_stream.PatchesTrianglesDataStream
represents data streams within a patch with triples of 1-based indices (triangles) of vertices specified in theahds.data_stream.VerticesDataStream
.
Conversion classes¶
There are two groups of conversion classes which only apply to (some) AmiraMesh data streams: Conversion classes
- Image conversion classes consist of a image container class
ahds.data_stream.ImageSet
and anahds.data_stream.Image
class. ImageSet objects that can be iterated to giveahds.data_stream.Image
objects are returned from theahds.data_stream.AmiraMeshDataStream.to_images()
method call.
>>> # decode the data stream to images
>>> images = ds[1].to_images()
>>> images
<ImageSet with 200 images>
>>> for image in images:
... print image
...
<Image with dimensions (971, 862)>
<Image with dimensions (971, 862)>
<Image with dimensions (971, 862)>
...
<Image with dimensions (971, 862)>
<Image with dimensions (971, 862)>
- Contour conversion classes convert individual images into sets of contours (
ahds.data_stream.ContourSet
) iterable as individualahds.data_stream.Contours
objects. They are obtained from calls to theahds.data_stream.Image.as_contours
property. Furthermore, theahds.data_stream.Image.as_segments
property call returns a dictionary of the correspondingahds.data_stream.ContourSet
object indexed by the z plane.
>>> # contours per image
>>> # the dictionary key is the Amira Id for the segment (the Id of the Material)
>>> # a segment can have several non-overlapping contours (or polylines)
>>> for image in images:
... print image.as_contours
...
{2: <class 'ahds.data_stream.ContourSet'> with 15 contours, 3: <class 'ahds.data_stream.ContourSet'> with 3 contours, 5: <class 'ahds.data_stream.ContourSet'> with 2 contours}
{2: <class 'ahds.data_stream.ContourSet'> with 18 contours, 3: <class 'ahds.data_stream.ContourSet'> with 3 contours, 5: <class 'ahds.data_stream.ContourSet'> with 2 contours}
...
{2: <class 'ahds.data_stream.ContourSet'> with 15 contours, 3: <class 'ahds.data_stream.ContourSet'> with 1 contours, 5: <class 'ahds.data_stream.ContourSet'> with 3 contours}
{2: <class 'ahds.data_stream.ContourSet'> with 15 contours, 3: <class 'ahds.data_stream.ContourSet'> with 1 contours, 5: <class 'ahds.data_stream.ContourSet'> with 3 contours}
>>> # separate individual segments
>>> images.segments
{1: {110: <class 'ahds.data_stream.ContourSet'> with 1 contours}, 2: {0: <class 'ahds.data_stream.ContourSet'> with 15 contours, 1: <class 'ahds.data_stream.ContourSet'> with 18 contours, ..., 198: <class 'ahds.data_stream.ContourSet'> with 3 contours, 199: <class 'ahds.data_stream.ContourSet'> with 3 contours}}