Teradata - Questions & Answers



Dear readers, these Teradata Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of Teradata. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer −

Teradata Architecture consists of three components.

  • Parsing Engine − Parsing Engine receives the query from the user, parses it and prepares the execution plan.

  • BYNET − BYNET receives the execution plan from the Parsing Engine and dispatches to the appropriate AMP.

  • AMP − AMP is responsible for storing and retrieving rows. It stores the data in the virtual disk associated with it. In addition to this, AMP is responsible for lock management, space management, sorting and aggregation.

FastLoad MultiLoad
Target table should be empty Target table need not be empty
Only one table can be loaded using a single script Can load/update up to 5 tables
Supports only CREATE/INSERT statement Supports up to 20 DML statements in single script
Doesn’t support tables with RI, SI and Triggers Supports tables with NUSI

FastLoad provides restart capability through checkpoints. When the script is restarted from the last checkpoint, it is possible that the same rows may be sent again to the AMPs. That’s the reason FastLoad doesn’t support duplicates.

SET table does not allow duplicate records whereas MULTISET allows duplicate records.

For each row inserted, System checks if there is any record with the same row hash. If the table has UPI defined, then it will reject the record as duplicate. Otherwise, it will compare the entire record for duplicate. This will severely affect the system performance.

You can define either Unique Primary Index or Unique Secondary Index to avoid duplicate row checking.

Tables are created using CREATE TABLE statement. Tables can be created using

  • CREATE TABLE statement with column definition.

  • CREATE TABLE from an existing table.

  • CREATE TABLE statement with a SELECT statement.

Duplicate records can be identified using DISTINCT statement or GROUP BY statement.

SELECT DISTINCT column 1, column 2… 
FROM tablename;
  
OR
  
SELECT column 1, column 2,… 
FROM tablename 
GROUP BY column 1, column 2….;
  • Primary keys are not mandatory in Teradata whereas Primary Index is mandatory.

  • Data distribution is based on Primary Index value.

  • Primary keys doesn’t accept NULLs whereas Primary Index accepts NULL values.

  • Primary keys are unique whereas Primary Index can be either unique (UPI) or non unique(NUPI).

  • Primary keys doesn’t change whereas Primary Indexes change.

Data can be accessed in 3 different ways −

  • Through Primary Index
  • Through Secondary Index
  • Full Table Scan

It can be identified using query SELECT HASHAMP() + 1;

The following query can be used for this purpose.

SELECT HASHMAP(HASHBUCKET(HASHROW(primaryindexvalue))), COUNT(*) 
FROM tablename GROUP BY 1; 

Teradata supports two transaction modes.

  • Teradata
  • ANSI

Teradata mode is set using SET SESSION TRANSACTION BTET; ANSI mode is set using SET SESSION TRANSACTION ANSI;

Transactions can be executed using BT and ET statements.

Join Indexes cannot be directly accessed by the user. Only the optimizer can access them.

The duplicate records will be rejected from loading the target tables and will be inserted into the UV table.

FALLBACK is a protection mechanism used by Teradata to handle AMP failures. For each data row, another copy of the row is stored in a different AMP within a cluster. If any AMP fails, then the corresponding rows will be accessed using FALLBACK AMP.

FALLBACK can be mentioned while table creation using CREATE TABLE statement or after the table is created using ALTER TABLE statement.

Spool space error will happen if the intermediate results of the query exceeds per AMP spool space limit set for the user who submitted the query.

SLEEP command specifies the waiting time before Teradata attempts to establish the connection.

TENACITY command specifies the total waiting time for Teradata to establish a new connection.

You can just keep the BEGIN LOADING and END LOADING statement and submit the FASTLOAD script. Other option is to drop the table and create the table again.

Caching in Teradata works with the source and remains in the same order, that is, it does not change frequently. The Cache is usually shared among applications. It is an added advantage of using Teradata.

RAID is a protection mechanism to handle disk failure. It stands for Redundant Array of Independent Disks. RAID 1 is commonly used in Teradata.

Secondary index provides alternate path to access the data. They are used to avoid Full Table Scan. However, secondary indexes require additional physical structure for maintaining sub tables and also require additional I/O since the sub table needs to be updated for each row.

There are four different locks in Teradata − Exclusive, Write, Read, and Access.

Locks can be applied at three different levels − Database, Table, and Row.

Using Multi Value Compression (MVC) you can compress up to 255 values including NULLs.

FastLoad loads the data in 64K blocks. There are 2 phases in FastLoad.

  • In Phase 1, it brings the data in 64K blocks and sends them to the target AMPs. Each AMP will then hash redistribute the rows to their target AMPs.

  • In Phase 2, rows are sorted by their row hash order and written into the target table.

MultiLoad import has five phases.

  • Phase 1 − Preliminary Phase – Performs basic setup activities.

  • Phase 2 − DML Transaction Phase – Verifies the syntax of DML statements and brings them to Teradata system.

  • Phase 3 − Acquisition Phase – Brings the input data into work tables and locks the table.

  • Phase 4 − Application Phase – Applies all DML operations.

  • Phase 5 − Cleanup Phase – Releases the table lock.

MULTILOAD DELETE is faster since it deletes the records in blocks. DELETE FROM will delete row by row.

Stored procedure returns one or more values whereas Macros can return one or more rows. In addition to SQL, stored procedure may contain SPL statements.

Both FastLoad and MultiLoad load the data in 64K blocks whereas BTEQ will process one row at a time.

FastExport exports the data in 64K blocks whereas BTEQ exports one row at a time.

Teradata Parallel Transporter (TPT) is the utility to load/export data. It combines all the functionalities of FastLoad, MultiLoad, BTEQ, TPUMP and FastExport.

Permanent journals keeps track of data before or after applying the changes. This help to roll back or roll forward the table to a particular state. Permanent journals can be enabled at table level or database level.

In Teradata, each AMP is associated with a virtual disk. Only the AMP that owns the virtual disk can access the data within that virtual disk. This is called as Shared Nothing Architecture.

  • If the query uses partitioned columns then it will result in partition elimination, which will greatly improve the performance.

  • Partition eliminates other partitions and accesses only the partitions that contain the data.

  • You can easily drop the old partitions and create new partitions.

Yes. Secondary index requires sub-tables which require permanent space.

Yes. Whenever partitioned primary index is added, then each row occupies additional 2 or 8 bytes for the partition number.

You can use RANK function on the specified column with descending order with Qualify = 2 condition.

You can check the EXPLAIN plan of the query to identify the steps that consume more spool space and try to optimize the query. Filters can be applied to reduce the number of records being processed or you can split the large query into multiple smaller queries.

When the EXPLAIN command is used against the query, it specifies the confidence of the optimizer to retrieve the records.

There are three confidence levels in Teradata: High Confidence, Medium Confidence, and Low Confidence.

Both NUSI and Full Table Scan (FTS) will access all the AMPs but FTS will access all the blocks within the AMP whereas NUSI will access the blocks only if the sub-table contains the qualifying rows.

In BTEQ mode, SKIP command can be used to skip the records.

BYTEINT. It occupies only one byte and can store values up to +127.

  • Through Unique Primary Index – 1 AMP
  • Through Non Unique Primary Index – 1 AMP
  • Through Unique Secondary Index – 2 AMPs
  • Through Non Unique Secondary Index – All AMPs

Clique is a protection mechanism to handle Node failures. It is a group of nodes. When a node within a clique fails, then the vprocs (Parsing Engine and AMP) will migrate to other nodes and continue to perform read/write operations on their virtual disks.

Teradata provides different levels of protection mechanism.

  • Transient Journal − To handle Transaction failure.

  • Fallback − To handle AMP failure.

  • Cliques − To handle Node failure.

  • RAID − To handle Disk failure.

  • Hot standby Node − To handle Node failure without affecting performance and restart.

ACTIVITYCOUNT gives the number of rows affected by the previous SQL query in BTEQ. If the ACTIVITYCOUNT statement follows an insert statement, it returns the number of rows inserted. If the ACTIVITYCOUNT statement follows select statement, it returns the number of rows selected.

What is Next?

Further you can go through your past assignments you have done with the subject and make sure you are able to speak confidently on them. If you are fresher then interviewer does not expect you will answer very complex questions, rather you have to make your basics concepts very strong.

Second it really doesn't matter much if you could not answer few questions but it matters that whatever you answered, you must have answered with confidence. So just feel confident during your interview. We at tutorialspoint wish you best luck to have a good interviewer and all the very best for your future endeavor. Cheers :-)

Advertisements