Datasets and datastores are configuration types that define how and where to pull the data from. They are used at design time to create shared configurations that can be stored and used at runtime.
All connectors (input and output components) created using Talend Component Kit must reference a valid dataset. Each dataset must reference a datastore.
-
Datastore: The data you need to connect to the backend.
-
Dataset: A datastore coupled with the data you need to execute an action.
Make sure that:
These rules are enforced by the |
Defining a datastore
A datastore defines the information required to connect to a data source. For example, it can be made of:
-
a URL
-
a username
-
a password.
You can specify a datastore and its context of use (in which dataset, etc.) from the Component Kit Starter.
Make sure to modelize the data your components are designed to handle before defining datasets and datastores in the Component Kit Starter. |
Once you generate and import the project into an IDE, you can find datastores under a specific datastore
node.
Example of datastore:
package com.mycomponent.components.datastore;
@DataStore("DatastoreA") (1)
@GridLayout({ (2)
// The generated component layout will display one configuration entry per line.
// Customize it as much as needed.
@GridLayout.Row({ "apiurl" }),
@GridLayout.Row({ "username" }),
@GridLayout.Row({ "password" })
})
@Documentation("A Datastore made of an API URL, a username, and a password. The password is marked as Credential.") (3)
public class DatastoreA implements Serializable {
@Option
@Documentation("")
private String apiurl;
@Option
@Documentation("")
private String username;
@Option
@Credential
@Documentation("")
private String password;
public String getApiurl() {
return apiurl;
}
public DatastoreA setApiurl(String apiurl) {
this.apiurl = apiurl;
return this;
}
public String getUsername() {
return Username;
}
public DatastoreA setuUsername(String username) {
this.username = username;
return this;
}
public String getPassword() {
return password;
}
public DatastoreA setPassword(String password) {
this.password = password;
return this;
}
}
1 | Identifying the class as a datastore and naming it. |
2 | Defining the layout of the datastore configuration. |
3 | Defining each element of the configuration: a URL, a username, and a password. Note that the password is also marked as a credential. |
Defining a dataset
A dataset represents the inbound data. It is generally made of:
-
A datastore that defines the connection information needed to access the data.
-
A query.
You can specify a dataset and its context of use (in which input and output component it is used) from the Component Kit Starter.
Make sure to modelize the data your components are designed to handle before defining datasets and datastores in the Component Kit Starter. |
Once you generate and import the project into an IDE, you can find datasets under a specific dataset
node.
Example of dataset referencing the datastore shown above:
package com.datastorevalidation.components.dataset;
@DataSet("DatasetA") (1)
@GridLayout({
// The generated component layout will display one configuration entry per line.
// Customize it as much as needed.
@GridLayout.Row({ "datastore" })
})
@Documentation("A Dataset configuration containing a simple datastore") (2)
public class DatasetA implements Serializable {
@Option
@Documentation("Datastore")
private DatastoreA datastore;
public DatastoreA getDatastore() {
return datastore;
}
public DatasetA setDatastore(DatastoreA datastore) {
this.datastore = datastore;
return this;
}
}
1 | Identifying the class as a dataset and naming it. |
2 | Implementing the dataset and referencing DatastoreA as the datastore to use. |
Internationalizing datasets and datastores
The display name of each dataset and datastore must be referenced in the message.properties
file of the family package.
The key for dataset and datastore display names follows a defined pattern: ${family}.${configurationType}.${name}._displayName
. For example:
ComponentFamily.dataset.DatasetA._displayName=Dataset A
ComponentFamily.datastore.DatastoreA._displayName=Datastore A
These keys are automatically added for datasets and datastores defined from the Component Kit Starter.
Reusing datasets and datastores in Talend Studio
When deploying a component or set of components that include datasets and datastores to Talend Studio, a new node is created under Metadata. This node has the name of the component family that was deployed.
It allows users to create reusable configurations for datastores and datasets.
With predefined datasets and datastores, users can then quickly fill the component configuration in their jobs. They can do so by selecting Repository as Property Type and by browsing to the predefined dataset or datastore.
How to create a reusable connection in Studio
Studio will generate connection and close components auto for reusing connection function in input and output components, just need to do like this example:
@Service
public class SomeService {
@CreateConnection
public Object createConn(@Option("configuration") SomeDataStore dataStore) throws ComponentException {
Object connection = null;
//get conn object by dataStore
return conn;
}
@CloseConnection
public CloseConnectionObject closeConn() {
return new CloseConnectionObject() {
public boolean close() throws ComponentException {
Object connection = this.getConnection();
//do close action
return true;
}
};
}
}
Then the runtime mapper and processor only need to use @Connection to get the connection like this:
@Version(1)
@Icon(value = Icon.IconType.CUSTOM, custom = "SomeInput")
@PartitionMapper(name = "SomeInput")
@Documentation("the doc")
public class SomeInputMapper implements Serializable {
@Connection
SomeConnection conn;
}
How does the component server interact with datasets and datastores
The component server scans all configuration types and returns a configuration type index. This index can be used for the integration into the targeted platforms (Studio, web applications, and so on).
Dataset
Mark a model (complex object) as being a dataset.
-
API: @org.talend.sdk.component.api.configuration.type.DataSet
-
Sample:
{
"tcomp::configurationtype::name":"test",
"tcomp::configurationtype::type":"dataset"
}
Datastore
Mark a model (complex object) as being a datastore (connection to a backend).
-
API: @org.talend.sdk.component.api.configuration.type.DataStore
-
Sample:
{
"tcomp::configurationtype::name":"test",
"tcomp::configurationtype::type":"datastore"
}
DatasetDiscovery
Mark a model (complex object) as being a dataset discovery configuration.
-
API: @org.talend.sdk.component.api.configuration.type.DatasetDiscovery
-
Sample:
{
"tcomp::configurationtype::name":"test",
"tcomp::configurationtype::type":"datasetDiscovery"
}
The component family associated with a configuration type (datastore/dataset) is always the one related to the component using that configuration. |
The configuration type index is represented as a flat tree that contains all the configuration types, which themselves are represented as nodes and indexed by ID.
Every node can point to other nodes. This relation is represented as an array of edges that provides the child IDs.
As an illustration, a configuration type index for the example above can be defined as follows:
{nodes: {
"idForDstore": { datastore:"datastore data", edges:[id:"idForDset"] },
"idForDset": { dataset:"dataset data" }
}
}