Java EE Tutorial
Java EE 7 - Introduction to Batch (JSR 352)
Continuing our Java EE series, JSR 352 lead Chris Vignola shows us around the batch processing specification and how it can be utilised.
Batch is not a new concept. In fact, it's been around for decades. Java developers have recognized the need for batch applications, but have had to get by with non-standard approaches - until now. JSR 352 introduces an exciting new Java specification for building, deploying, and running batch applications. Batch is an industry metaphor for background bulk processing. Myriad business processes depend on batch processing and demand powerful standards-based facilities for enabling this essential workload type.
JSR 352 addresses three critical concerns: a batch programming model, a job specification language, and a batch runtime. This constitutes a separation of concerns.
- Application developers have clear, reusable interfaces for constructing batch style applications;
- Job writers have a powerful expression language for how to execute the steps of a batch execution;
- Solution integrators have a runtime API for initiating and controlling batch execution.
JSR 352 specifically does not address job scheduling. There are so many options available for doing this it would have been redundant. Job scheduling is typically an enterprise concern, with scheduled batch being part of a broader workload mix. You can easily apply scheduling mechanisms ranging from EJB timers or cron up to schedulers like Control-M, Tivoli Workload Scheduler, UC4 ONE Automation, and others, to drive Java batch jobs.
JSR 352 is defined for both Java EE 7 and Java SE 6 platforms. Consult the JSR 352 Home Page for additional details. While nearly all of JSR 352 is common between Java EE and Java SE environments, the remainder of this article will focus exclusively on JSR 352 as part of Java EE 7.
JSR 352 in the Java EE 7 Landscape
JSR 352 defines a Job Specification Language (JSL) to define batch jobs, a set of interfaces that describes the artifacts that comprise the batch programming model to implement batch business logic, and a batch runtime for running batch jobs, according to a defined life cycle.
The batch runtime is a part of the Java EE 7 runtime and has full access to all other features of the platform, including transaction management, persistence, messaging, and more.
Batch Artifacts and Runtime Model
JSR 352 defines numerous batch programming APIs with which to implement batch business logic. Commonly used among these are ItemReader, ItemProcessor, and ItemWriter, which define the basic runtime contracts for reading, processing, and writing data items for a batch step. Other APIs are available for building task-oriented steps, interposing on lifecycle events, and controlling partitioned steps. You organize batch steps into jobs, which defines the run sequence. You use the JobOperator runtime interface to run and interact with jobs. Job executions are stored in a repository, enabling query of current and historical job status.
Batch Runtime Power
For each data "chunk":
- Begin chunk (begin transaction)
- Process chunk
- Take checkpoint (commit transaction)
- End chunk
The batch runtime empowers the developer by providing managed execution for batch jobs, including transaction demarcation, checkpoint/restart processing, partitioned and concurrent (split) step processing, and more. By handling these common batch concerns in the runtime, developers are free to concentrate on core business logic.
Batch artifacts are simple POJOs. Simply package them anywhere that CDI or the context class loader can find them. For example, in a web application, you would package them under WEB-INF/classes. Job definitions (JSL XML documents) can be packaged as part of your application or be supplied external to your application according to an implementation specific mechanism. When packaged with your application, they go under the META-INF/batch-jobs directory. Be sure to consult the JSR 352 specification for additional details on your artifact and job loading options.
Job Execution Model
You start jobs through the JobOperator interface. You can optionally specify substitution properties when starting a job to customize the job definition (JSL). The batch runtime loads the batch artifacts you specify in your JSL and runs the job on a separate thread. All steps in your job run on the same thread unless you are using partitions or splits. Your JSL may contain conditional logic that controls the order in which the steps run. The batch runtime handles your job's conditional logic, ensuring the correct execution sequence.
Partitioned Step Execution
Listing 1: Simple job with partitioned step
<job id="Job1"> <step id="Step1"> <chunk> <reader ref="MyReader"/> <processor ref="MyProcessor"/> <writer ref="MyWriter"/> <partition> <plan partitions="3"/> </partition> </chunk> </step> </job>
You may optionally configure a job step to run as a partitioned step. This strategy is effective when you need to reduce elapsed time for large bulk data processing tasks. By exploiting multi-threading, you can take what would otherwise be a long running, serially processed step, and break it into partitions that can run on separate threads. The batch runtime manages the threads for you. You simply decide how many partitions you want to break your data into.
The batch runtime runs a separate instance of the step's chunk or batchlet artifacts for each partition. You can configure and pass properties for each partition in order to customize the per-partition execution. The batch runtime handles all aspects of checkpoint and restart for the individual partitions. You just focus on the business logic. JSR 352 provides both declarative (JSL) and dynamic (API) models for calculating the number of partitions, and the distinct properties for each, enabling you to exercise the level of control you need for partitioned step processing.
Split Flow Execution
Listing 2: Simple job with split flows
<job id="Job1"> <split id="split1"> <flow id="flow1" next="flow2"> <step id="Step1" next="Step2"> <batchlet ref="MyBatchlet"/> </step> <step id="Step2"> <chunk> <reader ref="MyReader"/> <processor ref="MyProcessor"/> <writer ref="MyWriter"/> </chunk> </step> </flow> <flow id="flow2" next="flow3"> ... </flow> <flow id="flow3"> ... </flow> </split> </job>
You may optionally configure steps to run concurrently. This is called a split. In fact, you can configure sequences of steps, called flows, to run concurrently. This batch mechanism allows for highly flexible parallel processing of related heterogeneous tasks. You simply define the parallel step sequences into flows, and then nest the flows in a split. The batch runtime takes care of all thread management and runs each flow concurrently, managing all aspects of step sequencing within each flow and status handling of both the individual flows as well as the overall job.
Optional Decision artifacts can be configured into the job to programmatically direct conditional sequencing of the split to the next execution element based on the outcome of the individual flows within the split. The power of the split is its ability to reduce your jobs elapsed time by parallelizing independent execution flows.
Batch applications make up an important part of an overall enterprise workload, performing key background, bulk-oriented processing tasks. JSR 352 defines a powerful programming model and runtime to easily build, deploy, and run mission-critical batch applications. JSR 352 specifically separates concerns so the batch runtime can satisfy common infrastructure concerns, freeing developers to concentrate on the core business logic. This new specifications fills an important gap in the Java platforms. With availability for both Java EE and SE platforms, there's an implementation available to meet a wide range of needs.
- JSR 352 Java EE 7 Reference Implementation
Author Bio - Chris Vignola works for the IBM AIM Software organization and is the lead architect for WebSphere Systems Management. He is also the spec lead for JSR 352 Batch Applications for the Java Platform. He was formerly the WebSphere Batch Technology Chief Architect. He has over 28 years industry experience in architecture and development of software systems, including WebSphere Extended Deployment, WebSphere Application Server, and the MVS operating system. Chris lead architecture and design for the operational facilities of MVS Sysplex, was a charter member of the WebSphere Application Server for z/OS team, specializing in EJB container and systems management components, and more recently lead the WebSphere Compute Grid development team.