Java EE Tutorial

Java EE 7 – Introduction to Batch (JSR 352)

ChrisVignola
java-ee-7-cueball

Continuing our Java EE series, JSR 352 lead Chris Vignola shows us around the batch processing specification and how it can be utilised.

Continuing our Java EE series, JSR 352 lead Chris Vignola
shows us around the batch processing specification and how it can
be utilised.

Batch is not a new concept. In fact, it’s been around for
decades. Java developers have recognized the need for batch
applications, but have had to get by with non-standard approaches –
until now. JSR 352 introduces an exciting new Java specification
for building, deploying, and running batch applications.
Batch is an industry metaphor for background bulk
processing. Myriad business processes depend on batch processing
and demand powerful standards-based facilities for enabling this
essential workload type.

JSR 352 addresses three critical concerns: a batch programming
model, a job specification language, and a batch runtime. This
constitutes a separation of concerns.

  1. Application developers have clear, reusable interfaces for
    constructing batch style applications;
  2. Job writers have a powerful expression language for how to
    execute the steps of a batch execution; 
  3. Solution integrators have a runtime API for initiating and
    controlling batch execution.

JSR 352 specifically does not address
job scheduling. There are so many options available for doing this
it would have been redundant. Job scheduling is typically an
enterprise concern, with scheduled batch being part of a broader
workload mix. You can easily apply scheduling mechanisms ranging
from EJB timers or cron up to schedulers like Control-M, Tivoli
Workload Scheduler, UC4 ONE Automation, and others, to drive Java
batch jobs.


JSR 352
is defined for both Java EE 7 and Java SE 6 platforms.
Consult the JSR 352 Home
Page
for additional details. While nearly all of JSR 352 is
common between Java EE and Java SE environments, the remainder of
this article will focus exclusively on JSR 352 as part of Java EE
7.

JSR 352 in the Java EE 7 Landscape

JSR 352 defines a Job Specification Language (JSL) to define
batch jobs, a set of interfaces that describes the artifacts that
comprise the batch programming model to implement batch business
logic, and a batch runtime for running batch jobs, according to a
defined life cycle.

The batch runtime is a part of the Java EE 7 runtime and has
full access to all other features of the platform, including
transaction management, persistence, messaging, and more.

Batch Artifacts and Runtime Model

JSR 352 defines numerous batch programming APIs with which to
implement batch business logic. Commonly used among these are
ItemReader, ItemProcessor, and ItemWriter, which define the basic
runtime contracts for reading, processing, and writing data items
for a batch step. Other APIs are available for building
task-oriented steps, interposing on lifecycle events, and
controlling partitioned steps. You organize batch steps into jobs,
which defines the run sequence. You use the JobOperator runtime
interface to run and interact with jobs. Job executions are stored
in a repository, enabling query of current and historical job
status.

Batch Runtime Power

For each data “chunk”:

  • Begin chunk (begin transaction)
  • Process chunk 
  • Take checkpoint (commit transaction)
  • End chunk

The batch runtime empowers the developer by providing managed
execution for batch jobs, including transaction demarcation,
checkpoint/restart processing, partitioned and concurrent (split)
step processing, and more. By handling these common batch concerns
in the runtime, developers are free to concentrate on core business
logic.

Packaging Model

Batch artifacts are simple POJOs. Simply package them anywhere
that CDI or the context class loader can find them. For example, in
a web application, you would package them under WEB-INF/classes.
Job definitions (JSL XML documents) can be packaged as part of your
application or be supplied external to your application according
to an implementation specific mechanism. When packaged with your
application, they go under the META-INF/batch-jobs directory. Be
sure to consult the JSR 352 specification for additional details on
your artifact and job loading options.

Job Execution Model

You start jobs through the JobOperator interface. You can
optionally specify substitution properties when starting a job to
customize the job definition (JSL). The batch runtime loads the
batch artifacts you specify in your JSL and runs the job on a
separate thread. All steps in your job run on the same thread
unless you are using partitions or splits. Your JSL may contain
conditional logic that controls the order in which the steps run.
The batch runtime handles your job’s conditional logic, ensuring
the correct execution sequence.

Partitioned Step Execution

Listing 1: Simple job with partitioned
step 

 

<job id="Job1">
     <step id="Step1">
          <chunk>
               <reader ref="MyReader"/>
               <processor ref="MyProcessor"/>
               <writer ref="MyWriter"/>
               <partition>
                    <plan partitions="3"/>
               </partition>
          </chunk>
     </step>
</job>

 

You may optionally configure a job step to run as a partitioned
step. This strategy is effective when you need to reduce elapsed
time for large bulk data processing tasks. By exploiting
multi-threading, you can take what would otherwise be a long
running, serially processed step, and break it into partitions that
can run on separate threads. The batch runtime manages the threads
for you. You simply decide how many partitions you want to break
your data into.

The batch runtime runs a separate instance of the step’s chunk
or batchlet artifacts for each partition. You can configure and
pass properties for each partition in order to customize the
per-partition execution. The batch runtime handles all aspects of
checkpoint and restart for the individual partitions. You just
focus on the business logic. JSR 352 provides both declarative
(JSL) and dynamic (API) models for calculating the number of
partitions, and the distinct properties for each, enabling you to
exercise the level of control you need for partitioned step
processing.

Split Flow Execution

Listing 2: Simple job with split flows

 

<job id="Job1">
     <split id="split1">
          <flow id="flow1" next="flow2">
               <step id="Step1" next="Step2">
                    <batchlet ref="MyBatchlet"/>
               </step>
               <step id="Step2">  
                    <chunk>
                         <reader ref="MyReader"/>
                         <processor ref="MyProcessor"/>
                         <writer ref="MyWriter"/>
                    </chunk>
               </step>
          </flow>
          <flow id="flow2" next="flow3">
          ...
          </flow>
          <flow id="flow3">
           ...
          </flow>
     </split>
</job>

 

You may optionally configure steps to run
concurrently. This is called a split. In fact, you can configure
sequences of steps, called flows, to run concurrently. This batch
mechanism allows for highly flexible parallel processing of related
heterogeneous tasks. You simply define the parallel step sequences
into flows, and then nest the flows in a split. The batch runtime
takes care of all thread management and runs each flow
concurrently, managing all aspects of step sequencing within each
flow and status handling of both the individual flows as well as
the overall job.

Optional Decision artifacts can be
configured into the job to programmatically direct conditional
sequencing of the split to the next execution element based on the
outcome of the individual flows within the split. The power of the
split is its ability to reduce your jobs elapsed time by
parallelizing independent execution flows.

Conclusion

Batch applications make up an important part of an overall
enterprise workload, performing key background, bulk-oriented
processing tasks. JSR 352 defines a powerful programming model and
runtime to easily build, deploy, and run mission-critical batch
applications. JSR 352 specifically separates concerns so the batch
runtime can satisfy common infrastructure concerns, freeing
developers to concentrate on the core business logic. This new
specifications fills an important gap in the Java platforms. With
availability for both Java EE and SE platforms, there’s an
implementation available to meet a wide range of needs.

References

Author Bio - Chris Vignola works for
the IBM AIM Software organization and is the lead architect for
WebSphere Systems Management.  He is also the spec lead for
JSR 352 Batch Applications for the Java Platform.  He was
formerly the WebSphere Batch Technology Chief Architect.  He
has over 28 years industry experience in architecture and
development of software systems, including WebSphere Extended
Deployment, WebSphere Application Server, and the MVS operating
system.  Chris lead architecture and design for the
operational facilities of MVS Sysplex, was a charter member of the
WebSphere Application Server for z/OS team, specializing in EJB
container and systems management components, and more recently lead
the WebSphere Compute Grid development team.

Author
ChrisVignola
Chris Vignola works for the IBM AIM Software organization and is the lead architect for WebSphere Systems Management. He is also the spec lead for JSR 352 Batch Applications for the Java Platform. He was formerly the WebSphere Batch Technology Chief Architect. He has over 28 years industry experience in architecture and development of software systems, including WebSphere Extended Deployment, WebSphere Application Server, and the MVS operating system. Chris lead architecture and design for the operational facilities of MVS Sysplex, was a charter member of the WebSphere Application Server for z/OS team, specializing in EJB container and systems management components, and more recently lead the WebSphere Compute Grid development team.
Comments
comments powered by Disqus