Defining an input component logic

Input components are the components generally placed at the beginning of a Talend job. They are in charge of retrieving the data that will later be processed in the job.

An input component is primarily made of three distinct logics: - The execution logic of the component itself, defined through a partition mapper. - The configurable part of the component, defined through the mapper configuration. - The source logic defined through a producer.

Before implementing the component logic and defining its layout and configurable fields, make sure you have specified its basic metadata, as detailed in this document.

Defining a partition mapper

What is a partition mapper

A PartitionMapper is a component able to split itself to make the execution more efficient.

This concept is borrowed from big data and useful in this context only (BEAM executions). The idea is to divide the work before executing it in order to reduce the overall execution time.

The process is the following:

  1. The size of the data you work on is estimated. This part can be heuristic and not very precise.

  2. From that size, the execution engine (runner for Beam) requests the mapper to split itself in N mappers with a subset of the overall work.

  3. The leaf (final) mapper is used as a Producer (actual reader) factory.

This kind of component must be Serializable to be distributable.

Implementing a partition mapper

A partition mapper requires three methods marked with specific annotations:

  1. @Assessor for the evaluating method

  2. @Split for the dividing method

  3. @Emitter for the Producer factory

@Assessor

The Assessor method returns the estimated size of the data related to the component (depending its configuration). It must return a Number and must not take any parameter.

For example:

@Assessor
public long estimateDataSetByteSize() {
    return ....;
}

@Split

The Split method returns a collection of partition mappers and can take optionally a @PartitionSize long value as parameter, which is the requested size of the dataset per sub partition mapper.

For example:

@Split
public List<MyMapper> split(@PartitionSize final long desiredSize) {
    return ....;
}

@Emitter

The Emitter method must not have any parameter and must return a producer. It uses the partition mapper configuration to instantiate and configure the producer.

For example:

@Emitter
public MyProducer create() {
    return ....;
}

Defining the producer method

TheĀ Producer method defines the source logic of an input component. It handles the interaction with a physical source and produces input data for the processing flow.

A producer must have a @Producer method without any parameter. It is triggered by the @Emitter of the partition mapper and can return any data. It is defined in the <component_name>Source.java file:

@Producer
public MyData produces() {
    return ...;
}
Scroll to top