kodexa.steps

Some example steps that can be used locally

Submodules

Package Contents

Classes

NodeTagger

A node tagger allows you to provide a type and content regular expression and then

NodeTagCopy

The NodeTagCopy action allows you select nodes specified by the selector and create copies of the existing_tag (if it exists) with the new_tag_name.

TextParser

Parser to load a source file as a text document. The text from the document may be placed on the root ContentNode or on the root's child nodes (controlled by lines_as_child_nodes).

RollupTransformer

The rollup step allows you to decide how you want to collapse content in a document by removing nodes

class kodexa.steps.NodeTagger(selector, tag_to_apply, content_re='.*', use_all_content=True, node_only=False, node_tag_uuid=None)

A node tagger allows you to provide a type and content regular expression and then tag content in all matching nodes.

It allows for multiple matching groups to be defined, also the ability to use all content and also just tag the node (ignoring the matching groups)

selector

The selector to use to find the node(s) to tag

content_re

A regular expression used to match the content in the identified nodes

use_all_content

A flag that will assume that all content should be tagged (there will be no start/end)

tag_to_apply

The tag to apply to the node(s)

node_only

Tag the node only and no content

node_tag_uuid

The UUID to use on the tag

process(document)
class kodexa.steps.NodeTagCopy(selector, existing_tag_name, new_tag_name)

The NodeTagCopy action allows you select nodes specified by the selector and create copies of the existing_tag (if it exists) with the new_tag_name. If a tag with the ‘existing_tag_name’ does not exist on a selected node, no action is taken for that node.

selector

The selector to match the nodes

existing_tag_name

The existing tag name that will be the source

new_tag_name

The new tag name that will be the destination

process(document)
class kodexa.steps.TextParser(encoding='utf-8', lines_as_child_nodes=False)

Parser to load a source file as a text document. The text from the document may be placed on the root ContentNode or on the root’s child nodes (controlled by lines_as_child_nodes).

encoding

The encoding that should be used when attempting to decode data (default ‘utf-8’)

lines_as_child_nodes

If True, the lines of the file will be set as children of the root ContentNode; otherwise, the entire file content is set on the root ContentNode. (default False)

decode_text(data)
process(document)
class kodexa.steps.RollupTransformer(collapse_type_res=None, reindex: bool = True, selector: str = '.', separator_character: str = None, get_all_content: bool = False)

The rollup step allows you to decide how you want to collapse content in a document by removing nodes while maintaining content and features as needed

process(document)
is_node_in_list(node, node_ids)
Parameters
  • node

  • node_ids

Returns:

exception kodexa.steps.KodexaProcessingException(message, description, advice=None, documentation_url=None)

Bases: Exception

This is a specialized exception, if thrown while in the Kodexa Platform we will include the additional exception details so that they can be presented back to the user

description

The description of the problem, this is longer description

advice

Any advice on how to handle the problem, this can also include markdown to help present possible solutions

message

A short message to express the problem

documentation_url

A link to a URL where the user might find more information on the problem

__str__()

Return str(self).