Informatica Techies: 2014

Tuesday, 9 December 2014

Recovery Mode In Informatica

If your workflow gets failed and the integration service is still able to access the workflow then we can easily recover the workflow. When you recover a workflow it starts from the point where it got interruption

The best way to explain recovery mode is by an example. Suppose that a session fails after loading of one million records in to the target then on recovery it will start from the records after one millionth record(Note: Target load type should be normal)

New features of INFORMATICA 9 compared to INFORMATICA 8.6

Informatica 9 empowers line-of-business managers and business analysts to identify bad data and fix it faster. Architecture wise there are no differences between Informatica 8 and 9 but there are some new features added in powercenter 9.

New Client tools

Informatica 9 includes the Informatica Developer and Informatica Analystclient tools.

The Informatica Developer tool is eclipse-based and supports both data integration and data quality for enhanced productivity.From here you can update/refine those same rules, and create composite data objects - e.g. Get customer details from a number of different sources and aggregate these up to a Customer Data Object.

The Informatica Analyst tool is a browser-based tool for analysts, stewards and line of business managers. This tool supports data profiling, specifying and validating rules (Scorecards), and monitoring data quality.

Informatica Administrator

The powercenter Administration Console has been renamed the Informatica Administrator.

The Informatica Administrator is now a core service in the Informatica Domain that is used to configure and manage all Informatica Services, Security and other domain objects (such as connections) used by the new services.

The Informatica Administrator has a new interface. Some of the properties and configuration tasks from the powercenter Administration Console have been moved to different locations in Informatica Administrator. The Informatica Administrator is expanded to include new services and objects.

Cache Update in Lookup Transformation

You can update the lookup cache based on the results of an expression. When an expression is true, you can add to or update the lookup cache. You can update the dynamic lookup cache with the results of an expression.

Database deadlock resilience

In previous releases, when the Integration Service encountered a database deadlock during a lookup, the session failed. Effective in 9.0, the session will not fail. When a deadlock occurs, the Integration Service attempts to run the last statement in a lookup. You can configure the number of retry attempts and time period between attempts.

Multiple rows return

Lookups can now be configured as an Active transformation to return Multiple Rows.We can configure the Lookup transformation to return all rows that match a lookup condition. A Lookup transformation is an active transformation when it can return more than one row for any given input row.

Limit the Session Log

You can limit the size of session logs for real-time sessions. You can limit the size by time or by file size. You can also limit the number of log files for a session.

Auto-commit

We can enable auto-commit for each database connection. Each SQL statement in a query defines a transaction. A commit occurs when the SQL statement completes or the next statement is executed, whichever comes first.

Passive transformation

We can configure the SQL transformation to run in passive mode instead of active mode. When the SQL transformation runs in passive mode, the SQL transformation returns one output row for each input row.

Connection management

Database connections are centralized in the domain. We can create and view database connections in Informatica Administrator, Informatica Developer, or Informatica Analyst. Create, view, edit, and grant permissions on database connections in Informatica Administrator.

Monitoring

We can monitor profile jobs, scorecard jobs, preview jobs, mapping jobs, and SQL Data Services for each Data Integration Service. View the status of each monitored object on the Monitoring tab of Informatica Administrator.

Deployment

We can deploy, enable, and configure deployment units in the Informatica Administrator. Deploy Deployment units to one or more Data Integration Services. Create deployment units in Informatica Developer.

Model Repository Service

Application service that manages the Model repository. The Model repository is a relational database that stores the metadata for projects created in Informatica Analyst and Informatica Designer. The Model repository also stores run-time and configuration information for applications deployed to a Data.

Data Integration Service

Application service that processes requests from Informatica Analyst and Informatica Developer to preview or run data profiles and mappings. It also generates data previews for SQL data services and runs SQL queries against the virtual views in an SQL data service. Create and enable a Data Integration Service on the Domain tab of Informatica Administrator.

XML Parser

The XML Parser transformation can validate an XML document against a schema. The XML Parser transformation routes invalid XML to an error port. When the XML is not valid, the XML Parser transformation routes the XML and the error messages to a separate output group that We can connect to a target.

Enforcement of licensing restrictions

Powercenter will enforce the licensing restrictions based on the number of CPUs and repositories.

Also Informatica 9 supports data integration for the cloud as well as on premise. You can integrate the data in cloud applications, as well as run Informatica 9 on cloud infrastructure.

Saturday, 15 November 2014

Repository Objects

Repository Objects

You create repository objects using the Repository Manager, Designer, and Workflow Manager client tools. You can view the following objects in the Navigator window of the

Repository Manager:

Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide source data.

Target definitions. Definitions of database objects or files that contain the target data.

Multi-dimensional metadata. Target definitions that are configured as cubes and dimensions.

Mappings. A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Power Center Server uses to transform and move data.

Reusable transformations. Transformations that you can use in multiple mappings.

Mapplets. A set of transformations that you can use in multiple mappings.

Sessions and workflows. Sessions and workflows store information about how and when the Power Center Server moves data.

A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.

Informatica Design Process

Informatica Design Process

The goal of the design process is to create mappings that depict the flow of data between sources and targets, including changes made to the data before it reaches the targets. However, before you can create a mapping, you must first create or import source and target definitions. You might also want to create reusable objects, such as reusable transformations or mapplets. For a list of objects you create in the Design process, see Repository Objects.

Perform the following design tasks in the Designer:

Import source definitions. Use the Source Analyzer to connect to the sources and import the source definitions.

Create or import target definitions. Use the Warehouse Designer to define relational, flat file, or XML targets to receive data from sources. You can import target definitions from a relational database or a flat file, or you can manually create a target definition.

Create the target tables. If you add a target definition to the repository that does not exist in a relational database, you need to create target tables in your target database. You do this by generating and executing the necessary SQL code within the Warehouse Designer.

Design mappings. Once you have source and target definitions in the repository, you can create mappings in the Mapping Designer. A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. A transformation is an object that performs a specific function in a mapping, such as looking up data or performing aggregation.

Create mapping objects. Optionally, you can create reusable objects for use in multiple mappings. Use the Transformation Developer to create reusable transformations. Use the Mapplet Designer to create mapplets. A mapplet is a set of transformations that may contain sources and transformations.

Debug mappings. Use the Mapping Designer to debug a valid mapping to gain troubleshooting information about data and error conditions.

Import and export repository objects. You can import and export repository objects, such as sources, targets, transformations, mapplets, and mappings to archive or share metadata.

Saturday, 25 October 2014

Properties of Administration Console

Using the Repository Server Administration Console -

Use the Repository Server Administration Console to administer your Repository Servers and repositories. A Repository Server can manage multiple repositories. You use the Repository Server Administration Console to create and administer the repository through the Repository Server.

You can use the Administration Console to perform the following tasks:

Add, edit, and remove repository configurations.

Export and import repository configurations.

Create a repository.

Promote a local repository to a global repository.

Copy a repository.

Delete a repository from the database.

Back up and restore a repository.

Start, stop, enable, and disable repositories.

Send repository notification messages.

Propagate domain connection information for a repository.

View repository connections and locks.

Close repository connections.

Upgrade a repository.

Friday, 24 October 2014

Power Center Client

Power Center Client -

The PowerCenter Client consists of the following applications that you use to manage the repository, design mappings, mapplets, and create sessions to load the data:

Repository Server Administration Console. Use the Repository Server Administration console to administer the Repository Servers and repositories.

Repository Manager. Use the Repository Manager to administer the metadata repository. You can create repository users and groups, assign privileges and permissions, and manage folders and locks.

Designer. Use the Designer to create mappings that contain transformation instructions for the PowerCenter Server. Before you can create mappings, you must add source and target definitions to the repository.

The Designer has five tools that you use to analyze sources, design target schemas, and build source-to-target mappings:

Source Analyzer. Import or create source definitions.

Warehouse Designer. Import or create target definitions.

Transformation Developer. Develop reusable transformations to use in mappings.

Mapplet Designer. Create sets of transformations to use in mappings.

Mapping Designer. Create mappings that the Power Center Server uses to extract, transform, and load data.

Workflow Manager. Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.

The Power Center Server runs workflow tasks according to the links connecting the tasks. You can run a task by placing it in a workflow.

Workflow Monitor. Use the Workflow Monitor to monitor scheduled and running workflows for each Power Center Server. You can choose a Gantt Chart or Task view. You can also access details about those workflow runs.

Install the client tools on a Microsoft Windows machine. For more information about installation requirements, see Minimum System Requirements.

Power Center repository & Repository Server

The Power Center repository resides on a relational database. The repository database tables contain the instructions required to extract, transform, and load data. Power Center Client applications access the repository database tables through the Repository Server.

You add metadata to the repository tables when you perform tasks in the PowerCenter Client application, such as creating users, analyzing sources, developing mappings or mapplets, or creating workflows. The PowerCenter Server reads metadata created in the Client application when you run a workflow. The PowerCenter Server also creates metadata, such as start and finish times of a session or session status.

You can develop global and local repositories to share metadata:

Global repository. The global repository is the hub of the domain. Use the global repository to store common objects that multiple developers can use through shortcuts. These objects may include operational or Application source definitions, reusable transformations, mapplets, and mappings.

Local repositories. A local repository is within a domain that is not the global repository. Use local repositories for development. From a local repository, you can create shortcuts to objects in shared folders in the global repository. These objects typically include source definitions, common dimensions and lookups, and enterprise standard transformations. You can also create copies of objects in non-shared folders.

Version control. A versioned repository can store multiple copies, or versions, of an object. Each version is a separate object with unique properties. PowerCenter version control features allow you to efficiently develop, test, and deploy metadata into production.

You can connect to a repository, back up, delete, or restore repositories using pmrep, a command line program. For more information on pmrep, see “Using pmrep”.

Repository Server -

The Repository Server manages repository connection requests from client applications. For each repository database registered with the Repository Server, it configures and manages a Repository Agent process. The Repository Server also monitors the status of running Repository Agents, and sends repository object notification messages to client applications.

The Repository Agent is a separate, multi-threaded process that retrieves, inserts, and updates metadata in the repository database tables. The Repository Agent ensures the consistency of metadata in the repository by employing object locking.

Targets

PowerCenter can load data into the following targets:

Relational. Oracle, Sybase, Sybase IQ, Informix, IBM DB2, Microsoft SQL Server, and Teradata

File. Fixed and delimited flat file and XML.

Application. You can purchase additional PowerConnect products to load data into SAP BW. You can also load data into IBM MQSeries message queues and TIBCO.

Other. Microsoft Access.

You can load data into targets using ODBC or native drivers, FTP, or external loaders.
For more information about targets, see “Working with Targets” in the Designer Guide.

Sources

Sources

Power Center accesses the following sources:

Relational. Oracle, Sybase, Informix, IBM DB2, Microsoft SQL Server, and Teradata.

File. Fixed and delimited flat file, COBOL file, and XML.

Application. You can purchase additional PowerConnect products to access business sources, such as PeopleSoft, SAP R/3, Siebel, IBM MQSeries, and TIBCO.

Mainframe. You can purchase PowerConnect for Mainframe for faster access to IBM DB2 on MVS.

Other. Microsoft Excel and Access.

Note: The Designer imports relational sources, such as Microsoft Excel, Microsoft Access, and Teradata using ODBC and native drivers.

For more information about sources, see “Working with Sources” in the Designer Guide.

Monday, 20 October 2014

Creation of Mapping

A mapping is created using mapping designer tool in the designer client component.

PROCEDURE:

From tools menu

Click on mapping designer
Go to mapping menu

Select create

Enter mapping name (m-simplepass)

Click ok

From repository navigator window expand the source subfolder, drag the source definition (EMP), and drop on mapping designer work space.

From target subfolder drag the target definition (t-employee), drop on mapping designer work space.

without source qualifier the integration service cannot perform extraction from the source definition the source qualifier the sql queries to extract the data.

From source qualifier connect the columns to the target definition.

From repository menu click on ‘save’.

A source qualifier is automatically created when we drag and drop the source definition in the mapping designer’s work space.

Creation of Target Defenition

The source definition can be created using source analyzer tool in the designer client component.

From data base the source analyzer can get the structure of the data

Open the client designer

Note:

At a time only one folder can be active and the bold representation indicates that the folder is in active state.

The subfolder source contains source definition and targets contain target definition.

The sub folders cannot be created for that they are created automatically

The right hand side window is called the work space.

The left window is the repository navigator.

PROCEDURE:

Open the client source designer

From repository navigator window connect to the repository with the name <Rep_name>

Select the folder (dwh)

From tools menu select source analyzer

From source menu

Click on import from database

Connect to the database with the following data

1. ODBC data source : dwh-source-oracle

2. Username : Scott

3. Owner name : Scott

4. Password : tiger

Click on connect

Select the tables and click ok

From repository Manu click an’ save’

Creation of Source Defenition

The source definition can be created using source analyzer tool in the designer client component.

From data base the source analyzer can get the structure of the data

Open the client designer

Note:

At a time only one folder can be active and the bold representation indicates that the folder is in active state.

The sub-folder source contains source definition and targets contain target definition.

The sub folders cannot be created for that they are created automatically

The right hand side window is called the work space.

The left window is the repository navigator.

PROCEDURE:

Open the client source designer

From repository navigator window connect to the repository with the name <Rep_name>

Select the folder (DWH)

From tools menu select source analyzer

From source menu

Click on import from database

Connect to the database with the following data

1. ODBC data source: dwh-source-oracle

2. Username : Scott

3. Owner name : Scott

4. Password : tiger

Click on connect

Select the tables and click ok

From repository MEnu click an’ save’

Creation Of Folder in Informatica

Please fallow below step by step procedure

Start programs

Select -- > Informatica power center X.X

Select client

Click on power center repository manager.

POWER CENTRE CLIENT

The following are the power centre client components.

1) Power Center Designer.

2) Work flow manager.

3) Work flow monitor.

4) Repository Manager.

A client is a GUI where we can deal with Meta data.

REPOSITORY MANAGER

The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets. The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version.

Informatica Power Center

Informatica PowerCenter provides an environment that allows you to load data into a centralized location, such as a datamart, data warehouse, or operational data store (ODS).

You can extract data from multiple sources, transform the data according to business logic you build in the client application, and load the transformed data into file and relational targets.

PowerCenter provides the following integrated components:

Load Manager Process and Data Transformation Manager Process

Load Manager Process and Data Transformation Manager Process

The Load Manager is the primary Informatica Server process. It performs the following

tasks:

Manages session and batch scheduling.

Locks the session and reads session properties.

INFORMATICA

Informatica Power Center is a powerful ETL tool from Informatica Corporation.

Informatica Corporation products are:

Informatica Power Center
Informatica on demand
Informatica B2B Data Exchange
Informatica Data Quality
Informatica Data Explorer

Informatica Power Center is a single, unified enterprise data integration platform

Steps to install Informatica Power Center Server 8.6

Pre-requisites:

1)Window xp sp2 ( You can also install the same on any high end OS ).

2)At least one database(here we assume you to have Oracle XE pre installed ). 3)2 GB Ram Recommended

Informatica 8.6 Server Installation:

1) Select the sever folder from the informatica installation CD.

2)There will find set up, click on the set up file.

What is the difference between a primary key and a surrogate key?

primary key -

A primary key is a special constraint on a column or set of columns. A primary key constraint ensures that the column(s) so designated have no NULL values, and that every value is unique. Physically, a primary key is implemented by the database system using a unique index, and all the columns in the primary key must have been declared NOT NULL. A table may have only one primary key, but it may be composite (consist of more than one column).

ODS (Operational Data Sources)

ODS (Operational Data Sources):

My understanding of ODS is, its a replica of OLTP system and so the need of this, is to reduce the burden on production system (OLTP) while fetching data for loading targets. Hence its a mandate Requirement for every Warehouse.

What is the difference between snow flake and star schema

What is the difference between snow flake and star schema -

Star Schema	Snow Flake Schema
The star schema is the simplest data warehouse scheme.	Snowflake schema is a more complex data warehouse model than a star schema.
In star schema each of the dimensions is represented in a single table .It should not have any hierarchies between dims.	In snow flake schema at least one hierarchy should exists between dimension tables.
It contains a fact table surrounded by dimension tables. If the dimensions are de-normalized, we say it is a star schema design.	It contains a fact table surrounded by dimension tables. If a dimension is normalized, we say it is a snow flaked design.
In star schema only one join establishes the relationship between the fact table and any one of the dimension tables.	In snow flake schema since there is relationship between the dimensions tables it has to do many joins to fetch the data.
A star schema optimizes the performance by keeping queries simple and providing fast response time. All the information about the each level is stored in one row.	Snowflake schemas normalize dimensions to eliminated redundancy. The result is more complex queries and reduced query performance.
It is called a star schema because the diagram resembles a star.	It is called a snowflake schema because the diagram resembles a snowflake.

Difference between data mart and data warehouse

Data Mart	Data Warehouse
Data mart is usually sponsored at the department level and developed with a specific issue or subject in mind, a data mart is a data warehouse with a focused objective.	Data warehouse is a “Subject-Oriented, Integrated, Time-Variant, Nonvolatile collection of data in support of decision making”.
A data mart is used on a business division/ department level.	A data warehouse is used on an enterprise level
A Data Mart is a subset of data from a Data Warehouse. Data Marts are built for specific user groups.	A Data Warehouse is simply an integrated consolidation of data from a variety of sources that is specially designed to support strategic and tactical decision making.
By providing decision makers with only a subset of data from the Data Warehouse, Privacy, Performance and Clarity Objectives can be attained.	The main objective of Data Warehouse is to provide an integrated environment and coherent picture of the business at a point in time.

Friday, 17 October 2014

DWH Concepts

DWH Concepts -

Data-integrity

OLAP

Inmon vs. Kimball: Which approach is suitable for your data warehouse?

Different methods of loading Dimension tables

Surrogate-key

Normalization

Fact Table

Dimension

DataWareHousing - ETL Project Life Cycle ( Simple to understand )

DataWareHousing - ETL Project Life Cycle ( Simple to understand )

Warehousing -

Data warehousing projects are categorized into 4 types.

1) Development Projects.
2) Enhancement Projects
3) Migration Projects
4) Production support Projects.

The following are the different phases involved in a ETL project development life cycle.

1) Business Requirement Collection ( BRD )
2) System Requirement Collection ( SRD )
3) Design Phase
a) High Level Design Document ( HRD )
b) Low level Design Document ( LLD )
c) Mapping Design
4) Code Review
5) Peer Review
6) Testing
a) Unit Testing
b) System Integration Testing.
c) USer Acceptance Testing ( UAT )
7) Pre - Production
8) Production ( Go-Live )

Business Requirement Collection :-

The business requirement gathering start by business Analyst, onsite technical lead and client business users.

In this phase,a Business Analyst prepares Business Requirement Document ( BRD ) (or) Business Requirement Specifications ( BRS )

BR collection takes place at client location.

The o/p from BR Analysis are

BRS :- Business Analyst will gather the Business Requirement and document in BRS

SRS :- Senior technical people (or) ETL architect will prepare the SRS which contains s/w and h/w requirements.

The SRS will includes -

O/S to be used ( windows or unix )

RDBMS required to build database ( oracle, Teradata etc )

ETL tools required ( Informatica,Datastage )

OLAP tools required ( Cognos ,BO )

The SRS is also called as Technical Requirement Specifications ( TRS )

Designing and Planning the solutions :-

The o/p from design and planning phase is

HLD ( High Level Design ) Document

LLD ( Low Level Design ) Document

HLD ( High Level Design ) Document : -

An ETL Architect and DWH Architect participate in designing a solution to build a DWH.

An HLD document is prepared based on Business Requirement.

LLD ( Low Level Design ) Document : -

Based on HLD,a senior ETL developer prepare Low Level Design Document

The LLD contains more technical details of an ETL System.

An LLD contains data flow diagram ( DFD ), details of source and targets of each mapping.

An LLD also contains information about full and incremental load.

After LLD then Development Phase will start

Development Phase ( Coding ) :-

Based an LLD, the ETL team will create mapping ( ETL Code )

After designing the mappings, the code ( Mappings ) will be reviewed by developers.

Code Review

Code Review will be done by developer.

In code review,the developer will review the code and the logic but not the data.

The following activities takes place in code review

You have to check the naming standards of transformation,mappings of data etc.

Source and target mapping ( Placed the correct logic or not in mapping )

Peer Review :-

The code will reviewed by your team member ( third party developer )

Testing

The following various types testing carried out in testing environment.

Unit Testing

Development Integration Testing

System Integration Testing

User Acceptance Testing

Unit Testing :-

A unit test for the DWH is a white Box testing,It should check the ETL procedure and Mappings.

The following are the test cases can be executed by an ETL developer.

Verify data loss

No.of records in the source and target

Dataload/Insert

Dataload/Update

Incremental load

Data accuracy

verify Naming standards.

Verify column Mapping

The Unit Test will be carried by ETL developer in development phase.

ETL developer has to do the data validations also in this phase.

Development Integration Testing -

Run all the mappings in the sequence order.

First Run the source to stage mappings.

Then run the mappings related to dimensions and facts.

System Integration Testing -

After development phase,we have to move our code to QA environment.

In this environment,we are giving read-only permission to testing people.

They will test all the workflows.

And they will test our code according to their standards.

User Acceptance Testing ( UAT ) -

This test is carried out in the presence of client side technical users to verify the data migration from source to destination.

Production Environment -

Migrate the code into the Go-Live environment from test environment ( QA Environment ).

Tuesday, 9 December 2014

Saturday, 15 November 2014

Saturday, 25 October 2014

Friday, 24 October 2014

Monday, 20 October 2014

Saturday, 18 October 2014

Pre-requisites:

Friday, 17 October 2014