Defining datasets and datastores

Datasets and datastores are configuration types that define how and where to pull the data from. They are used at design time to create shared configurations that can be stored and used at runtime.

All connectors (input and output components) created using Talend Component Kit must reference a valid dataset. Each dataset must reference a datastore.

  • Datastore: The data you need to connect to the backend.

  • Dataset: A datastore coupled with the data you need to execute an action.

Dataset validation

Make sure that:

  • a datastore is used in each dataset.

  • each dataset has a corresponding input component (mapper or emitter).

  • This input component must be able to work with only the dataset part filled by final users. Any other property implemented for that component must be optional.

These rules are enforced by the validateDataSet validation. If the conditions are not met, the component builds will fail.

Defining a datastore

A datastore defines the information required to connect to a data source. For example, it can be made of:

  • a URL

  • a username

  • a password.

You can specify a datastore and its context of use (in which dataset, etc.) from the Component Kit Starter.

Make sure to modelize the data your components are designed to handle before defining datasets and datastores in the Component Kit Starter.

Once you generate and import the project into an IDE, you can find datastores under a specific datastore node.

Datastore view

Example of datastore:

package com.mycomponent.components.datastore;

@DataStore("DatastoreA") (1)
@GridLayout({ (2)
    // The generated component layout will display one configuration entry per line.
    // Customize it as much as needed.
    @GridLayout.Row({ "apiurl" }),
    @GridLayout.Row({ "username" }),
    @GridLayout.Row({ "password" })
})
@Documentation("A Datastore made of an API URL, a username, and a password. The password is marked as Credential.") (3)
public class DatastoreA implements Serializable {
    @Option
    @Documentation("")
    private String apiurl;

    @Option
    @Documentation("")
    private String username;

    @Option
    @Credential
    @Documentation("")
    private String password;

    public String getApiurl() {
        return apiurl;
    }

    public DatastoreA setApiurl(String apiurl) {
        this.apiurl = apiurl;
        return this;
    }

    public String getUsername() {
        return Username;
    }

    public DatastoreA setuUsername(String username) {
        this.username = username;
        return this;
    }

    public String getPassword() {
        return password;
    }

    public DatastoreA setPassword(String password) {
        this.password = password;
        return this;
    }
}
1 Identifying the class as a datastore and naming it.
2 Defining the layout of the datastore configuration.
3 Defining each element of the configuration: a URL, a username, and a password. Note that the password is also marked as a credential.

Defining a dataset

A dataset represents the inbound data. It is generally made of:

  • A datastore that defines the connection information needed to access the data.

  • A query.

You can specify a dataset and its context of use (in which input and output component it is used) from the Component Kit Starter.

Make sure to modelize the data your components are designed to handle before defining datasets and datastores in the Component Kit Starter.

Once you generate and import the project into an IDE, you can find datasets under a specific dataset node.

Dataset view

Example of dataset referencing the datastore shown above:

package com.datastorevalidation.components.dataset;

@DataSet("DatasetA") (1)
@GridLayout({
    // The generated component layout will display one configuration entry per line.
    // Customize it as much as needed.
    @GridLayout.Row({ "datastore" })
})
@Documentation("A Dataset configuration containing a simple datastore") (2)
public class DatasetA implements Serializable {
    @Option
    @Documentation("Datastore")
    private DatastoreA datastore;

    public DatastoreA getDatastore() {
        return datastore;
    }

    public DatasetA setDatastore(DatastoreA datastore) {
        this.datastore = datastore;
        return this;
    }
}
1 Identifying the class as a dataset and naming it.
2 Implementing the dataset and referencing DatastoreA as the datastore to use.

Internationalizing datasets and datastores

The display name of each dataset and datastore must be referenced in the message.properties file of the family package.

The key for dataset and datastore display names follows a defined pattern: ${family}.${configurationType}.${name}._displayName. For example:

ComponentFamily.dataset.DatasetA._displayName=Dataset A
ComponentFamily.datastore.DatastoreA._displayName=Datastore A

These keys are automatically added for datasets and datastores defined from the Component Kit Starter.

Reusing datasets and datastores in Talend Studio

When deploying a component or set of components that include datasets and datastores to Talend Studio, a new node is created under Metadata. This node has the name of the component family that was deployed.

It allows users to create reusable configurations for datastores and datasets.

Dataset in Studio

With predefined datasets and datastores, users can then quickly fill the component configuration in their jobs. They can do so by selecting Repository as Property Type and by browsing to the predefined dataset or datastore.

Dataset in Component

How to create a reusable connection in Studio

Studio will generate connection and close components auto for reusing connection function in input and output components, just need to do like this example:

@Service
public class SomeService {

    @CreateConnection
    public Object createConn(@Option("configuration") SomeDataStore dataStore) throws ComponentException {
        Object connection = null;
        //get conn object by dataStore
        return conn;
    }

    @CloseConnection
    public CloseConnectionObject closeConn() {
        return new CloseConnectionObject() {

            public boolean close() throws ComponentException {
                Object connection = this.getConnection();
                //do close action
                return true;
            }

        };
    }
}

Then the runtime mapper and processor only need to use @Connection to get the connection like this:

@Version(1)
@Icon(value = Icon.IconType.CUSTOM, custom = "SomeInput")
@PartitionMapper(name = "SomeInput")
@Documentation("the doc")
public class SomeInputMapper implements Serializable {

    @Connection
    SomeConnection conn;

}

How does the component server interact with datasets and datastores

The component server scans all configuration types and returns a configuration type index. This index can be used for the integration into the targeted platforms (Studio, web applications, and so on).

Dataset

Mark a model (complex object) as being a dataset.

  • API: @org.talend.sdk.component.api.configuration.type.DataSet

  • Sample:

{
  "tcomp::configurationtype::name":"test",
  "tcomp::configurationtype::type":"dataset"
}

Datastore

Mark a model (complex object) as being a datastore (connection to a backend).

  • API: @org.talend.sdk.component.api.configuration.type.DataStore

  • Sample:

{
  "tcomp::configurationtype::name":"test",
  "tcomp::configurationtype::type":"datastore"
}

DatasetDiscovery

Mark a model (complex object) as being a dataset discovery configuration.

  • API: @org.talend.sdk.component.api.configuration.type.DatasetDiscovery

  • Sample:

{
  "tcomp::configurationtype::name":"test",
  "tcomp::configurationtype::type":"datasetDiscovery"
}
The component family associated with a configuration type (datastore/dataset) is always the one related to the component using that configuration.

The configuration type index is represented as a flat tree that contains all the configuration types, which themselves are represented as nodes and indexed by ID.

Every node can point to other nodes. This relation is represented as an array of edges that provides the child IDs.

As an illustration, a configuration type index for the example above can be defined as follows:

{nodes: {
             "idForDstore": { datastore:"datastore data", edges:[id:"idForDset"] },
             "idForDset":   { dataset:"dataset data" }
    }
}
Scroll to top