This project has retired. For details please refer to its Attic page.
Apache ODE – Instance Data Cleanup

Instance Data Cleanup

Rational

During its execution, a process instance can accumulate a significant amount of data. The running process itself isn't that much of an issue, when the instance is done there's nothing left to execute. But the process data can be rather big, mostly because of the messages it received and sent and its own variables. All of these are XML documents and in some cases, sizable ones.

Levels of cleanup

Cleanup on completion

ODE Version

This feature is only available in ODE 1.3 or later

The easiest approach to get started is simply to wait until the instance execution is finished and then cleanup everything that's related to it. That would include the instance state with its variables, scopes and correlation, but also all the messages it has received and sent. Execution events should also be disposed of. So this description defines 5 different categories: instance, messages and events. We should be able to turn on and off each level separately.

  • instance: ODE_PROCESS_INSTANCE, EXECUTION_STATE
  • variables: ODE_SCOPE, ODE_XML_DATA, ODE_PARTNER_LINK
  • messages: ODE_MESSAGE, ODE_MESSAGE_ROUTE, ODE_MEX_PROPS, ODE_MESSAGE_EXCHANGE
  • correlations: ODE_CORRELATION_SET, ODE_CORSET_PROP
  • events: ODE_EVENTS
DTD
<xs:element name="cleanup" minOccurs="0" maxOccurs="3" type="dd:tCleanup" />

<xs:complexType name="tCleanup">
    <xs:sequence>
        <xs:element name="category" default="all" minOccurs="0" maxOccurs="unbounded">
            <xs:simpleType>
                <xs:restriction base="xs:string">
                    <xs:enumeration value="instance" />
                    <xs:enumeration value="variables" />
                    <xs:enumeration value="messages" />
                    <xs:enumeration value="correlations" />
                    <xs:enumeration value="events" />
                    <xs:enumeration value="all" />
                </xs:restriction>
            </xs:simpleType>
        </xs:element>
    </xs:sequence>
    <xs:attribute name="on" use="required">
        <xs:simpleType>
            <xs:restriction base="xs:string">
                <xs:enumeration value="success" />
                <xs:enumeration value="failure" />
                <xs:enumeration value="always" />
            </xs:restriction>
        </xs:simpleType>
    </xs:attribute>
</xs:complexType>

Examples

  1. no instance data cleanup

    <process name="pns:HelloWorld2">
    <active>true</active>
    <provide partnerLink="helloPartnerLink">
        <service name="wns:HelloService" port="HelloPort"/>
    </provide>
    </process>
    
  2. cleaning up all data on either successful or faulty completions of instances

    <process name="pns:HelloWorld2">
    <active>true</active>
    <provide partnerLink="helloPartnerLink">
        <service name="wns:HelloService" port="HelloPort"/>
    </provide>
    <cleanup on="always" />
    </process>
    
  3. cleaning up all data on successful completions of instances and no data cleanup on faulty completions

    <process name="pns:HelloWorld2">
    <active>true</active>
    <provide partnerLink="helloPartnerLink">
        <service name="wns:HelloService" port="HelloPort"/>
    </provide>
    <cleanup on="success" >
                <category>instance</category>
                <category>variables</category>
                <category>messages</category>
                <category>correlations</category>
                <category>events</category>
        </cleanup>
    </process>
    
  4. cleaning up all data on successful completions of instances and only messages and correlations on faulty completions

    <process name="pns:HelloWorld2">
    <active>true</active>
    <provide partnerLink="helloPartnerLink">
        <service name="wns:HelloService" port="HelloPort"/>
    </provide>
    
    <cleanup on="success" >
                <category>all</category>
        </cleanup>
        <cleanup on="failure">
                <category>messages</category>
                <category>correlations</category>
        </cleanup>
    </process>
    
  5. an +invalid+ configuration; the instance category should accompany the variable and correlations categories

    <process name="pns:HelloWorld2">
    <active>true</active>
    <provide partnerLink="helloPartnerLink">
        <service name="wns:HelloService" port="HelloPort"/>
    </provide>
    <cleanup on="success" >
                <category>all</category>
        </cleanup>
        <cleanup on="failure">
                <category>instance</category>
        </cleanup>
    </process>
    

Future Developments

WS-BPEL makes heavy use of scopes, those could be another hook in the execution lifecycle for the cleanup to take place. So instead of waiting until the instance is finished and clean up the whole state, we could proceed by smaller increments and delete the state scope by scope. For short running processes (say less than a few days) the advantages of this approach are minimal but for long running processes (say months), there's potentially a lot of unused state that's just sitting there and will never be used anymore.

Final notes

When we continue along the lines of refining further when the cleanup should occur and what exactly should be cleaned up, we quickly start getting close to the transaction boundaries. Down the road, ideally, we shouldn't persist anything unnecessarily, so that no cleanup is needed when a given piece of data will never be reused. It's often the case for message variables for example, where a process will receive a message, assign some values from it and never use that message variable anymore. So this should never get written, minimizing the writes and deletes.