Reject mode
Continue
Output
Fail
NULL Handling
Null - Special value outside the range of any existing or legitimate value
If NULL value is written to a non-null column, job will abort
Column should be specified as nullable to accept NULL value
Sequential file stage
1 I/P,1 O/P, 1 reject links
Non matching column metadata- rejected
To Read data and Write Data
Sequential mode(read by 1 node)
Record format and column format must be specified
Parallel mode
Reading multiple file
Specifying no. of readers per node parameter
Better I/O performance on SMP systems
Single file can be read by multiple node
By setting "Read from multiple nodes" property
Best for cluster system
Important Properties
Read method
Specific file(s)
File pattern
Options
Report Progress
No. of reader per node
Read from multiple nodes
Schema files
RCP works only with this
Can override default column declarations
As Target
Mode
Append
Reject
Null Handling
You can specify values u want DS to convert NULL
You must hand NULLs written to a nullable column
You need to tell DS what value to write to the file
Unhandled rows are rejected
Dataset stage
ADVANTAGES
Operating system(FRAMEWORK) File
Internal datastage formatted
DS is most efficient to process this file
Preserve Partitioning
Good performance
No export/Import operator needed to convert it into internal format
No partitioning needed
Represents persistent data
Related to file format
Ends with .ds
Contains 2 parts
Descriptor file
Contains metadata & data location
Data file (s)
Multiple unix data file (one per node)
Parallel accessibility
As target
Update Policy
Append any new data to the existing data.
Create (Error if exists). WebSphere DataStage reports an error if the data set already exists.
Overwrite. Overwrites any existing data with new data. (default)
Use existing (Discard records). Keeps the existing data and discards any new data.
Use existing (Discard records and schema). Keeps the existing data and discards any new data and its associated schema.
Managing Datasets
GUI (Manager,designer,director) - Tools > Dataset management
System Commandline
Orchadmin
List records
eg: $orchadmin ll sample.ds
Remove datasets
Cannot delete just descriptor or data files (will be inconsistent) Must use orchadmin
eg: orchadmin delete sample.ds
dsrecords
eg: $dsrecords myds.ds 16889 records
Fileset stage
Can read & Write
ADVANTAGE
Can Processed parallel
Some OS impose 2GB file limit this stage distribute file among nodes to prevent overruns due to file limit
It is not internal format file
So files more accessible to external app
Format
Extension .fs
Parts
Descriptor file
Location of raw datafile + schema
Data file(s)
Individual Raw data files
Number of raw data files depend on configuration file
No of files depends on
The number of processing nodes in the default node pool
The size of the partitions of the dataset
The no. of disks in the export or deafult disk pool connected to each processing node in the default node pool
Complex file format
Can be used for read/Write
Multiple output link with single reject link
Cobol Copy book MVS dataset with QSAM/VSAM
OCCURS (ARRAY)
OCCURS DEPENDING ON (ARRAY-DYNAMIC)
Can't have reject link
REDEFINES (UNION)
GROUP
Copy book import to repository through director
Complex file Load Option
Flatten selective arrays
Flatten all arrays
As is
Output link
sends normalized records
Eg: OCCURS 5 : 5 records sent out
Record ID constraints
Can be used I/P is multiple record type
Column meta data
Lookup fileset stage
.fs extension
File descriptor
1 file /Partition - data file
Can be used for range lookup
Good performance for lookup stage
Single Input link
Single output link which must be a reference link
External source
Allows you to read data from one or more source programs
1 O/P link, 1 reject link
External target
Allows you to write data to one or more source program
1 I/P link, 1 reject link
Types
Fixed length data
variable length data