Accessing Sequential data

Reject mode
1. Continue
2. Output
3. Fail
NULL Handling
1. Null - Special value outside the range of any existing or legitimate value
2. If NULL value is written to a non-null column, job will abort
  1. Column should be specified as nullable to accept NULL value
Sequential file stage
1. 1 I/P,1 O/P, 1 reject links
  1. Non matching column metadata- rejected
2. To Read data and Write Data
3. Sequential mode(read by 1 node)
4. Record format and column format must be specified
5. Parallel mode
  1. Reading multiple file
  2. Specifying no. of readers per node parameter
    1. Better I/O performance on SMP systems
  3. Single file can be read by multiple node
    1. By setting "Read from multiple nodes" property
    2. Best for cluster system
6. Important Properties
  1. Read method
    1. Specific file(s)
    2. File pattern
  2. Options
    1. Report Progress
    2. No. of reader per node
    3. Read from multiple nodes
    4. Schema files
      1. RCP works only with this
      2. Can override default column declarations
7. As Target
  1. Mode
    1. Append
    2. Reject
8. Null Handling
  1. You can specify values u want DS to convert NULL
  2. You must hand NULLs written to a nullable column
    1. You need to tell DS what value to write to the file
    2. Unhandled rows are rejected
Dataset stage
1. ADVANTAGES
  1. Operating system(FRAMEWORK) File
    1. Internal datastage formatted
    2. DS is most efficient to process this file
  2. Preserve Partitioning
  3. Good performance
    1. No export/Import operator needed to convert it into internal format
    2. No partitioning needed
2. Represents persistent data
3. Related to file format
  1. Ends with .ds
  2. Contains 2 parts
    1. Descriptor file
      1. Contains metadata & data location
    2. Data file (s)
      1. Multiple unix data file (one per node)
      2. Parallel accessibility
4. As target
  1. Update Policy
    1. Append any new data to the existing data.
    2. Create (Error if exists). WebSphere DataStage reports an error if the data set already exists.
    3. Overwrite. Overwrites any existing data with new data. (default)
    4. Use existing (Discard records). Keeps the existing data and discards any new data.
    5. Use existing (Discard records and schema). Keeps the existing data and discards any new data and its associated schema.
5. Managing Datasets
  1. GUI (Manager,designer,director) - Tools > Dataset management
  2. System Commandline
    1. Orchadmin
      1. List records
      2. eg: $orchadmin ll sample.ds
      3. Remove datasets
      4. Cannot delete just descriptor or data files (will be inconsistent) Must use orchadmin
      5. eg: orchadmin delete sample.ds
    2. dsrecords
      1. eg: $dsrecords myds.ds 16889 records
Fileset stage
1. Can read & Write
2. ADVANTAGE
  1. Can Processed parallel
  2. Some OS impose 2GB file limit this stage distribute file among nodes to prevent overruns due to file limit
3. It is not internal format file
  1. So files more accessible to external app
4. Format
  1. Extension .fs
  2. Parts
    1. Descriptor file
      1. Location of raw datafile + schema
    2. Data file(s)
      1. Individual Raw data files
      2. Number of raw data files depend on configuration file
5. No of files depends on
  1. The number of processing nodes in the default node pool
  2. The size of the partitions of the dataset
  3. The no. of disks in the export or deafult disk pool connected to each processing node in the default node pool
Complex file format
1. Can be used for read/Write
2. Multiple output link with single reject link
3. Cobol Copy book MVS dataset with QSAM/VSAM
  1. OCCURS (ARRAY)
  2. OCCURS DEPENDING ON (ARRAY-DYNAMIC)
    1. Can't have reject link
  3. REDEFINES (UNION)
  4. GROUP
  5. Copy book import to repository through director
4. Complex file Load Option
  1. Flatten selective arrays
  2. Flatten all arrays
  3. As is
5. Output link
  1. sends normalized records
    1. Eg: OCCURS 5 : 5 records sent out
6. Record ID constraints
  1. Can be used I/P is multiple record type
7. Column meta data
Lookup fileset stage
1. .fs extension
  1. File descriptor
  2. 1 file /Partition - data file
2. Can be used for range lookup
3. Good performance for lookup stage
4. Single Input link
5. Single output link which must be a reference link
External source
1. Allows you to read data from one or more source programs
2. 1 O/P link, 1 reject link
External target
1. Allows you to write data to one or more source program
2. 1 I/P link, 1 reject link
Types
1. Fixed length data
2. variable length data