Beam testing Learn how to test components in Beam test Beam Big Data testing If you want to make sure that your component works in Beam and don’t want to use Spark, you can try with the Direct Runner.
Check beam.apache.org/contribute/testing/ for more details.
Built-in services List of built-in services available with Talend Component Kit service component-manager internal json record localconfiguration provider resolver http client http record-schema The framework provides built-in services that you can inject by type in components and actions.
Type
Description
org.talend.sdk.component.api.service.cache.LocalCache
Provides a small abstraction to cache Data that does not need to be recomputed very often. Commonly used by actions for UI interactions.
org.talend.sdk.component.api.service.dependency.Resolver
Allows to resolve a dependency from its Maven coordinates. It can either try to resolve a local file or (better) creates for you a preinitialized classloader.
javax.json.bind.Jsonb
A JSON-B instance. If your model is static and you don’t want to handle the serialization manually using JSON-P, you can inject that instance.
javax.json.spi.JsonProvider
A JSON-P instance. Prefer other JSON-P instances if you don’t exactly know why you use this one.
javax.json.JsonBuilderFactory
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.
javax.json.JsonWriterFactory
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.
javax.json.JsonReaderFactory
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.
javax.json.stream.JsonParserFactory
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.
javax.json.stream.JsonGeneratorFactory
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.
org.talend.sdk.component.api.service.dependency.Resolver
Allows to resolve files from Maven coordinates (like dependencies.txt for component). Note that it assumes that the files are available in the component Maven repository.
org.talend.sdk.component.api.service.injector.Injector
Utility to inject services in fields marked with @Service.
org.talend.sdk.component.api.service.factory.ObjectFactory
Allows to instantiate an object from its class name and properties.
org.talend.sdk.component.api.service.record.RecordBuilderFactory
Allows to instantiate a record.
org.talend.sdk.component.api.service.record.RecordPointerFactory
Allows to instantiate a RecordPointer which enables to extract a Data from a Record based on jsonpointer specification.
org.talend.sdk.component.api.service.record.RecordService
Some utilities to create records from another one. It is typically what is used when you want to add an entry in a record and passthrough the other ones. It also provides a nice RecordVisitor API for advanced cases.
org.talend.sdk.component.api.service.configuration.LocalConfiguration
Represents the local configuration that can be used during the design.
It is not recommended to use it for the runtime because the local configuration is usually different and the instances are distinct.
You can also use the local cache as an interceptor with @Cached
Every interface that extends HttpClient and that contains methods annotated with @Request
Lets you define an HTTP client in a declarative manner using an annotated interface.
See the Using HttpClient for more details.
All these injected services are serializable, which is important for Big Data environments. If you create the instances yourself, you cannot benefit from these features, nor from the memory optimization done by the runtime. Prefer reusing the framework instances over custom ones.
The local configuration uses system properties and the environment (replacing dots per underscores) to look up the values. You can also put a TALEND-INF/local-configuration.properties file with default values. This allows to use the local_configuration: syntax in @Ui annotation. Here is an example to read the default value of a property from the configuration:
Ensure your key is unique across all components to avoid global overrides on the JVM. In practice, it is strongly recommended to always use the family as a prefix. Also note that you can use @Configuration("prefix") to inject a mapping of the LocalConfiguration in a component. It uses the same rules as for any configuration object. If you prefer to inject you configuration in a service, ensure to wrap it in a Supplier to always have an up to date version.
If you want to ignore the local-configuration.properties, you can set the system property: talend.component.configuration.${componentPluginId}.ignoreLocalConfiguration=true.
Here a sample @Configuration model:
Here is how to use it from a service:
And finally, here is how to use it in a component:
it is recommended to convert this configuration in a runtime model in components to avoid to transport more than desired during the job distribution.
You can access the API reference in the Javadocs.
The HttpClient usage is described in this section by using the REST API example below. Assuming that it requires a basic authentication header:
GET /api/records/{id}
-
POST /api/records
JSON payload to be created: {"id":"some id", "Data":"some Data"}
To create an HTTP client that is able to consume the REST API above, you need to define an interface that extends HttpClient.
The HttpClient interface lets you set the base for the HTTP address that the client will hit.
The base is the part of the address that needs to be added to the request path to hit the API. It is now possible, and recommended, to use @Base annotation.
Every method annotated with @Request in the interface defines an HTTP request. Every request can have a @Codec parameter that allows to encode or decode the request/response payloads.
You can ignore the encoding/decoding for String and Void payloads.
The interface should extend HttpClient.
In the codec classes (that implement Encoder/Decoder), you can inject any of your service annotated with @Service or @Internationalized into the constructor. Internationalization services can be useful to have internationalized messages for errors handling.
The interface can be injected into component classes or services to consume the defined API.
By default, /+json are mapped to JSON-P and /+xml to JAX-B if the model has a @XmlRootElement annotation.
For advanced cases, you can customize the Connection by directly using @UseConfigurer on the method. It calls your custom instance of Configurer. Note that you can use @ConfigurerOption in the method signature to pass some Configurer configurations.
For example, if you have the following Configurer:
You can then set it on a method to automatically add the basic header with this kind of API usage:
The framework provides in the component-api an OAuth1.Configurer which can be used as an example of configurer implementation. It expects a single OAuth1.Configuration parameter to be passed to the request as a @ConfigurationOption.
Here is a sample showing how it can be used:
By default, the client loads in memory the payload. In case of Big payloads, it can consume too much memory. For these cases, you can get the payload as an InputStream:
You can use the Response wrapper, or not.
Talend Input component for Hazelcast Example of input component implementation with Talend Component Kit tutorial example partition mapper producer source hazelcast distributed This tutorial walks you through the creation, from scratch, of a complete Talend input component for Hazelcast using the Talend Component Kit (TCK) framework.
Hazelcast is an in-memory distributed system that can store Data, which makes it a good example of input component for distributed systems. This is enough for you to get started with this tutorial, but you can find more information about it here: hazelcast.org/.
A TCK project is a simple Java project with specific configurations and dependencies. You can choose your preferred build tool from Maven or Gradle as TCK supports both. In this tutorial, Maven is used.
The first step consists in generating the project structure using Talend Starter Toolkit .
Go to starter-toolkit.talend.io/ and fill in the project information as shown in the screenshots below, then click Finish and Download as ZIP.
image::tutorial_hazelcast_generateproject_1.png[] image::tutorial_hazelcast_generateproject_2.png[]
Extract the ZIP file into your workspace and import it to your preferred IDE. This tutorial uses Intellij IDE, but you can use Eclipse or any other IDE that you are comfortable with.
You can use the Starter Toolkit to define the full configuration of the component, but in this tutorial some parts are configured manually to explain key concepts of TCK.
The generated pom.xml file of the project looks as follows:
Change the name tag to a more relevant value, for example: Component Hazelcast.
The component-api dependency provides the necessary API to develop the components.
talend-component-maven-plugin provides build and validation tools for the component development.
The Java compiler also needs a Talend specific configuration for the components to work correctly. The most important is the -parameters option that preserves the parameter names needed for introspection features that TCK relies on.
Download the mvn dependencies declared in the pom.xml file:
You should get a BUILD SUCCESS at this point:
Create the project structure:
Create the component Java packages.
Packages are mandatory in the component model and you cannot use the default one (no package). It is recommended to create a unique package per component to be able to reuse it as dependency in other components, for example to guarantee isolation while writing unit tests.
The project is now correctly set up. The next steps consist in registering the component family and setting up some properties.
Registering every component family allows the component server to properly load the components and to ensure they are available in Talend Studio.
The family registration happens via a package-info.java file that you have to create.
Move to the src/main/java/org/talend/components/hazelcast package and create a package-info.java file:
@Components: Declares the family name and the categories to which the component belongs.
@Icon: Defines the component family icon. This icon is visible in the Studio metaData tree.
Talend Component Kit supports internationalization (i18n) via Java properties files. Using these files, you can customize and translate the display name of properties such as the name of a component family or, as shown later in this tutorial, labels displayed in the component configuration.
Go to src/main/resources/org/talend/components/hazelcast and create an i18n Messages.properties file as below:
You can define the component family icon in the package-info.java file. The icon image must exist in the resources/icons folder.
TCK supports both SVG and PNG formats for the icons.
Create the icons folder and add an icon image for the Hazelcast family.
This tutorial uses the Hazelcast icon from the official GitHub repository that you can get from: avatars3.githubusercontent.com/u/1453152?s=200&v=4
Download the image and rename it to Hazelcast_icon32.png. The name syntax is important and should match _icon.32.png.
The component registration is now complete. The next step consists in defining the component configuration.
All Input and Output (I/O) components follow a predefined model of configuration. The configuration requires two parts:
Datastore: Defines all properties that let the component connect to the targeted system.
Dataset: Defines the Data to be read or written from/to the targeted system.
Connecting to the Hazelcast cluster requires the IP address, group name and password of the targeted cluster.
In the component, the Datastore is represented by a simple POJO.
Create a HazelcastDatastore.java class file in the src/main/java/org/talend/components/hazelcast folder.
Define the i18n properties of the Datastore. In the Messages.properties file let add the following lines:
The Hazelcast Datastore is now defined.
Hazelcast includes different types of Datastores. You can manipulate maps, lists, sets, caches, locks, queues, topics and so on.
This tutorial focuses on maps but still applies to the other Data structures.
Reading/writing from a map requires the map name.
Create the Dataset class by creating a HazelcastDataset.java file in src/main/java/org/talend/components/hazelcast.
The @Dataset annotation marks the class as a Dataset. Note that it also references a Datastore, as required by the components model.
Just how it was done for the Datastore, define the i18n properties of the Dataset. To do that, add the following lines to the Messages.properties file.
The component configuration is now ready. The next step consists in creating the Source that will read the Data from the Hazelcast map.
The Source is the class responsible for reading the Data from the configured Dataset.
A source gets the configuration instance injected by TCK at runtime and uses it to connect to the targeted system and read the Data.
Create a new class as follows.
The source also needs i18n properties to provide a readable display name. Add the following line to the Messages.properties file.
At this point, it is already possible to see the result in the Talend Component Web Tester to check how the configuration looks like and validate the layout visually. To do that, execute the following command in the project folder.
This command starts the Component Web Tester and deploys the component there.
Access localhost:8080/.
The source is set up. It is now time to start creating some Hazelcast specific code to connect to a cluster and read values for a map.
Add the hazelcast-client Maven dependency to the pom.xml of the project, in the dependencies node.
Add a Hazelcast instance to the @PostConstruct method.
Declare a HazelcastInstance attribute in the source class.
Any non-serializable attribute needs to be marked as transient to avoid serialization issues.
Implement the post construct method.
The component configuration is mapped to the Hazelcast client configuration to create a Hazelcast instance. This instance will be used later to get the map from its name and read the map Data. Only the required configuration in the component is exposed to keep the code as simple as possible.
Implement the code responsible for reading the Data from the Hazelcast map through the Producer method.
The Producer implements the following logic:
Check if the map iterator is already initialized. If not, get the map from its name and initialize the map iterator. This is done in the @Producer method to ensure the map is initialized only if the next() method is called (lazy initialization). It also avoids the map initialization in the PostConstruct method as the Hazelcast map is not serializable.
All the objects initialized in the PostConstruct method need to be serializable as the source can be serialized and sent to another worker in a distributed cluster.
From the map, create an iterator on the map keys that will read from the map.
Transform every key/value pair into a Talend Record with a "key, value" object on every call to next().
The RecordBuilderFactory class used above is a built-in service in TCK injected via the Source constructor. This service is a factory to create Talend Records.
Now, the next() method will produce a Record every time it is called. The method will return "null" if there is no more Data in the map.
Implement the @PreDestroy annotated method, responsible for releasing all resources used by the Source. The method needs to shut the Hazelcast client instance down to release any connection between the component and the Hazelcast cluster.
The Hazelcast Source is completed. The next section shows how to write a simple unit test to check that it works properly.
TCK provides a set of APIs and tools that makes the testing straightforward.
The test of the Hazelcast Source consists in creating an embedded Hazelcast instance with only one member and initializing it with some Data, and then in creating a test Job to read the Data from it using the implemented Source.
Add the required Maven test dependencies to the project.
Initialize a Hazelcast test instance and create a map with some test Data. To do that, create the HazelcastSourceTest.java test class in the src/test/java folder. Create the folder if it does not exist.
The above example creates a Hazelcast instance for the test and creates the MY-DISTRIBUTED-MAP map. The getMap creates the map if it does not already exist. Some keys and values uses in the test are added. Then, a simple test checks that the Data is correctly initialized. Finally, the Hazelcast test instance is shut down.
Run the test and check in the logs that a Hazelcast cluster of one member has been created and that the test has passed.
To be able to test components, TCK provides the @WithComponents annotation which enables component testing. Add this annotation to the test. The annotation takes the component Java package as a value parameter.
Create the test Job that configures the Hazelcast instance and link it to an output that collects the Data produced by the Source.
Execute the unit test and check that it passes, meaning that the Source is reading the Data correctly from Hazelcast.
The Source is now completed and tested. The next section shows how to implement the Partition Mapper for the Source. In this case, the Partition Mapper will split the work (Data reading) between the available cluster members to distribute the workload.
The Partition Mapper calculates the number of Sources that can be created and executed in parallel on the available workers of a distributed system. For Hazelcast, it corresponds to the cluster member count.
To fully illustrate this concept, this section also shows how to enhance the test environment to add more Hazelcast cluster members and initialize it with more Data.
Instantiate more Hazelcast instances, as every Hazelcast instance corresponds to one member in a cluster. In the test, it is reflected as follows:
The above code sample creates two Hazelcast instances, leading to the creation of two Hazelcast members. Having a cluster of two members (nodes) will allow to distribute the Data. The above code also adds more Data to the test map and updates the shutdown method and the test.
Run the test on the multi-nodes cluster.
The Source is a simple implementation that does not distribute the workload and reads the Data in a classic way, without distributing the read action to different cluster members.
Start implementing the Partition Mapper class by creating a HazelcastPartitionMapper.java class file.
When coupling a Partition Mapper with a Source, the Partition Mapper becomes responsible for injecting parameters and creating source instances. This way, all the attribute initialization part moves from the Source to the Partition Mapper class.
The configuration also sets an instance name to make it easy to find the client instance in the logs or while debugging.
The Partition Mapper class is composed of the following:
constructor: Handles configuration and service injections
Assessor: This annotation indicates that the method is responsible for assessing the Dataset size. The underlying runner uses the estimated Dataset size to compute the optimal bundle size to distribute the workload efficiently.
Split: This annotation indicates that the method is responsible for creating Partition Mapper instances based on the bundle size requested by the underlying runner and the size of the Dataset. It creates as much partitions as possible to parallelize and distribute the workload efficiently on the available workers (known as members in the Hazelcast case).
Emitter: This annotation indicates that the method is responsible for creating the Source instance with an adapted configuration allowing to handle the amount of records it will produce and the required services. I adapts the configuration to let the Source read only the requested bundle of Data.
The Assessor method computes the memory size of every member of the cluster. Implementing it requires submitting a calculation task to the members through a serializable task that is aware of the Hazelcast instance.
Create the serializable task.
The purpose of this class is to submit any task to the Hazelcast cluster.
Use the created task to estimate the Dataset size in the Assessor method.
The Assessor method calculates the memory size that the map occupies for all members. In Hazelcast, distributing a task to all members can be achieved using an execution service initialized in the getExecutorService() method. The size of the map is requested on every available member. By summing up the results, the total size of the map in the distributed cluster is computed.
The Split method calculates the heap size of the map on every member of the cluster. Then, it calculates how many members a source can handle.
If a member contains less Data than the requested bundle size, the method tries to combine it with another member. That combination can only happen if the combined Data size is still less or equal to the requested bundle size.
The following code illustrates the logic described above.
The next step consists in adapting the source to take the Split into account.
The following sample shows how to adapt the Source to the Split carried out previously.
The next method reads the Data from the members received from the Partition Mapper.
A Big Data runner like Spark will get multiple Source instances. Every source instance will be responsible for reading Data from a specific set of members already calculated by the Partition Mapper.
The Data is fetched only when the next method is called. This logic allows to stream the Data from members without loading it all into the memory.
Implement the method annotated with @Emitter in the HazelcastPartitionMapper class.
The createSource() method creates the source instance and passes the required services and the selected Hazelcast members to the source instance.
Run the test and check that it works as intended.
The component implementation is now done. It is able to read Data and to distribute the workload to available members in a Big Data execution environment.
Refactor the component by introducing a service to make some pieces of code reusable and avoid code duplication.
Refactor the Hazelcast instance creation into a service.
Inject this service to the Partition Mapper to reuse it.
Adapt the Source class to reuse the service.
Run the test one last time to ensure everything still works as expected.
Thank you for following this tutorial. Use the logic and approach presented here to create any input component for any system.
From Javajet to Talend Component Kit The Javajet framework is being replaced by the new Talend Component Kit. Learn the main differences and the new approach introduced with this framework. javajet studio studio-integration learning getting started principles From the version 7.0 of Talend Studio, Talend Component Kit becomes the recommended framework to use to develop components.
This framework is being introduced to ensure that newly developed components can be deployed and executed both in on-premise/local and cloud/Big Data environments.
From that new approach comes the need to provide a complete yet unique and compatible way of developing components.
With the Component Kit, custom components are entirely implemented in Java. To help you get started with a new custom component development project, a Starter is available. Using it, you will be able to generate the skeleton of your project. By importing this skeleton in a development tool, you can then implement the components layout and execution logic in Java.
With the previous Javajet framework, metaData, widgets and configurable parts of a custom component were specified in XML. With the Component Kit, they are now defined in the Configuration (for example, LoggerProcessorConfiguration) Java class of your development project.
Note that most of this configuration is transparent if you specified the Configuration Model of your components right before generating the project from the Starter.
Any undocumented feature or option is considered not supported by the Component Kit framework.
You can find examples of output in Studio or Cloud environments in the Gallery.
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
or
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Javajet
Component Kit
Return variables availability (1.51+) deprecates After Variables.
Javajet
AVAILABILITY can be :
AFTER : set after component finished.
FLOW : changed on row level.
Component Kit
Return variables can be nested as below:
Javajet
Component Kit
or
Previously, the execution of a custom component was described through several Javajet files:
_begin.javajet, containing the code required to initialize the component.
_main.javajet, containing the code required to process each line of the incoming Data.
_end.javajet, containing the code required to end the processing and go to the following step of the execution.
With the Component Kit, the entire execution flow of a component is described through its main Java class (for example, LoggerProcessor) and through services for reusable parts.
Each type of component has its own execution logic. The same basic logic is applied to all components of the same type, and is then extended to implement each component specificities. The project generated from the starter already contains the basic logic for each component.
Talend Component Kit framework relies on several primitive components.
All components can use @PostConstruct and @PreDestroy annotations to initialize or release some underlying resource at the beginning and the end of a processing.
In distributed environments, class constructor are called on cluster manager nodes. Methods annotated with @PostConstruct and @PreDestroy are called on worker nodes. Thus, partition plan computation and pipeline tasks are performed on different nodes.
All the methods managed by the framework must be public. Private methods are ignored.
The framework is designed to be as declarative as possible but also to stay extensible by not using fixed interfaces or method signatures. This allows to incrementally add new features of the underlying implementations.
To ensure that the Cloud-compatible approach of the Component Kit framework is respected, some changes were introduced on the implementation side, including:
The File mode is no longer supported. You can still work with URIs and remote storage systems to use files. The file collection must be handled at the component implementation level.
The input and output connections between two components can only be of the Flow or Reject types. Other types of connections are not supported.
Every Output component must have a corresponding Input component and use a Dataset. All Datasets must use a Datastore.
To get started with the Component Kit framework, you can go through the following documents:
Learn the basics about Talend Component Kit
Create and deploy your first Component Kit component
Learn about the Starter
Start implementing components
Integrate a component to Talend Studio
Check some examples of components built with Talend Component Kit
Wall of Fame Heroes of Talend Component Kit. contributor github profile developper gravatar commit Apache addict I'm involved in several project (OpenWebBeans, Johnzon, Geronimo, Meecrowave, ... - http://home.apache.org/committer-index.html#rmannibucau). Blog: https://rmannibucau.metawerx.net
Software engineer @Talend. Components team member. Blog: undx.github.io
Technical Writer
Frontend Architect
R&D
Frontend Architect. This is my Talend account. You can check out @jsomsanith for my personal account
Senior principal software engineer at @Talend, security contributor at @apache. Blog: http://coheigea.blogspot.com/
QA Automation @ Talend Nantes
Committer and PMC member of Apache Beam and Apache Avro. Free education and Open Source enthusiast. Distributed Systems practitioner (victim?) Blog: https://ismaelmejia.com/
Templar
I'm a serial tooler.
Blog: timeline.antoinenicolas.com
Blog: https://www.linkedin.com/in/zoltantakacsdev/
Focused on Big Data technologies. I contribute to open source projects. I'm an Apache Beam committer and PMC member and an Apache Software Foundation member Blog: https://echauchot.blogspot.com/
Senior Cloud Software Architect
Product Manager, Information Architect
Senior Application Security Engineer
Blog: www.talend.com
573PH4N3 D3M15 K3RM480N
Senior Software engineer at @komodohealth. Ph.D. in large-scale storage systems. Previous competitive programmer.
Defining a partition mapper How to develop a partition mapper with Talend Component Kit component type partition mapper input A Partition Mapper (PartitionMapper) is a component able to split itself to make the execution more efficient.
This concept is borrowed from Big Data and useful in this context only (BEAM executions). The idea is to divide the work before executing it in order to reduce the overall execution time.
The process is the following:
The size of the Data you work on is estimated. This part can be heuristic and not very precise.
From that size, the execution engine (runner for Beam) requests the mapper to split itself in N mappers with a subset of the overall work.
The leaf (final) mapper is used as a Producer (actual reader) factory.
This kind of component must be Serializable to be distributable.
A partition mapper requires three methods marked with specific annotations:
@Assessor for the evaluating method
@Split for the dividing method
@Emitter for the Producer factory
The Assessor method returns the estimated size of the Data related to the component (depending its configuration). It must return a Number and must not take any parameter.
For example:
The Split method returns a collection of partition mappers and can take optionally a @PartitionSize long value as parameter, which is the requested size of the Dataset per sub partition mapper.
For example:
The Emitter method must not have any parameter and must return a producer. It uses the partition mapper configuration to instantiate and configure the producer.
For example:
Defining an input component logic How to develop an input component with Talend Component Kit input partition mapper producer emitter Input components are the components generally placed at the beginning of a Talend job. They are in charge of retrieving the Data that will later be processed in the job.
An input component is primarily made of three distinct logics:
The execution logic of the component itself, defined through a partition mapper.
The configurable part of the component, defined through the mapper configuration.
The source logic defined through a producer.
Before implementing the component logic and defining its layout and configurable fields, make sure you have specified its basic metaData, as detailed in this document.
A Partition Mapper (PartitionMapper) is a component able to split itself to make the execution more efficient.
This concept is borrowed from Big Data and useful in this context only (BEAM executions). The idea is to divide the work before executing it in order to reduce the overall execution time.
The process is the following:
The size of the Data you work on is estimated. This part can be heuristic and not very precise.
From that size, the execution engine (runner for Beam) requests the mapper to split itself in N mappers with a subset of the overall work.
The leaf (final) mapper is used as a Producer (actual reader) factory.
This kind of component must be Serializable to be distributable.
A partition mapper requires three methods marked with specific annotations:
@Assessor for the evaluating method
@Split for the dividing method
@Emitter for the Producer factory
The Assessor method returns the estimated size of the Data related to the component (depending its configuration). It must return a Number and must not take any parameter.
For example:
The Split method returns a collection of partition mappers and can take optionally a @PartitionSize long value as parameter, which is the requested size of the Dataset per sub partition mapper.
For example:
The Emitter method must not have any parameter and must return a producer. It uses the partition mapper configuration to instantiate and configure the producer.
For example:
The Producer defines the source logic of an input component. It handles the interaction with a physical source and produces input Data for the processing flow.
A producer must have a @Producer method without any parameter. It is triggered by the @Emitter method of the partition mapper and can return any Data. It is defined in the Source.java file:
Talend Component Kit best practices List of best practices for developing Talend components. best practices checklist Some recommendations apply to the way component packages are organized:
Make sure to create a package-info.java file with the component family/categories at the root of your component package:
Create a package for the configuration.
Create a package for the actions.
Create a package for the component and one sub-package by type of component (input, output, processors, and so on).
It is recommended to serialize your configuration in order to be able to pass it through other components.
When building a new component, the first step is to identify the way it must be configured.
The two main concepts are:
The DataStore which is the way you can access the backend.
The DataSet which is the way you interact with the backend.
For example:
Example description
DataStore
DataSet
Accessing a relational Database like MySQL
JDBC driver, URL, username, password
Query to execute, row mapper, and so on.
Accessing a file system
File pattern (or directory + file extension/prefix/…)
File format, buffer size, and so on.
It is common to have the Dataset including the Datastore, because both are required to work. However, it is recommended to replace this pattern by defining both Dataset and Datastore in a higher level configuration model. For example:
Input and output components are particular because they can be linked to a set of actions. It is recommended to wire all the actions you can apply to ensure the consumers of your component can provide a rich experience to their users.
The most common actions are the following ones:
This action exposes a way to ensure the Datastore/connection works.
Configuration example:
Action example:
Until the studio integration is complete, it is recommended to limit processors to one input.
Configuring processor components is simpler than configuring input and output components because it is specific for each component. For example, a mapper takes the mapping between the input and output models:
It is recommended to provide as much information as possible to let the UI work with the Data during its edition.
Light validations are all the validations you can execute on the client side. They are listed in the UI hint section.
Use light validations first before going with custom validations because they are more efficient.
Custom validations enforce custom code to be executed, but are heavier to execute.
Prefer using light validations when possible.
Define an action with the parameters needed for the validation and link the option you want to validate to this action. For example, to validate a Dataset for a JDBC driver:
You can also define a Validable class and use it to validate a form by setting it on your whole configuration:
The parameter binding of the validation method uses the same logic as the component configuration injection. Therefore, the @Option method specifies the prefix to use to reference a parameter. It is recommended to use @Option("value") until you know exactly why you don’t use it. This way, the consumer can match the configuration model and just prefix it with value. to send the instance to validate.
Validations are triggers based on "events". If you mark part of a configuration as @Validable but this configuration is translated to a widget without any interaction, then no validation will happen. The rule of thumb is to mark only primitives and simple types (list of primitives) as @Validable.
It can be handy and user-friendly to provide completion on some fields. For example, to define completion for available drivers:
Each component must have its own icon:
You can use talend.surge.sh/icons/ to find the icon you want to use.
It is recommended to enforce the version of your component, event though it is not mandatory for the first version.
If you break a configuration entry in a later version; make sure to:
Upgrade the version.
Support a migration of the configuration.
Testing your components is critical. You can use unit and simple standalone JUnit tests, but it is also highly recommended to have Beam tests in order to make sure that your component works in Big Data.