Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Data Exports and Legacy CUR

Table of Contents

Introduction

This readme contains description of solutions for AWS Data Exports and Legacy CUR replication and consolidation across multiple accounts. This is a part of Cloud Intelligence Dashboards and it is recommended by AWS Data Exports official documentation.

Data Exports

For deployment instructions, please refer to the documentation at: https://catalog.workshops.aws/awscid/data-exports.

Check code here: data-exports-aggregation.yaml

Basic Architecture of Data Exports

Basic Architecture of Data Exports

  1. AWS Data Exports delivers daily Cost & Usage Report (CUR2) and other reports to an Amazon S3 Bucket in the Management Account.
  2. Amazon S3 replication rule copies Export data to a dedicated Data Collection Account S3 bucket automatically.
  3. Amazon Athena allows querying data directly from the S3 bucket using an AWS Glue table schema definition.
  4. Amazon QuickSight datasets can read from Amazon Athena. Check Cloud Intelligence Dashboards for more details.

Advanced Architecture of Data Exports

For customers with additional requirements, an enhanced architecture is available:

Advanced Architecture of Data Exports

  1. AWS Data Exports service delivers Cost & Usage Report (CUR2) daily to an Amazon S3 Bucket in your AWS Account (either in Management/Payer Account or a regular Linked Account). In us-east-1 region, the CloudFormation creates native resources; in other regions, CloudFormation uses AWS Lambda and Custom Resource to provision Data Exports in us-east-1.

  2. Amazon S3 replication rules copy Export data to a dedicated Data Collection Account automatically. This replication filters out all metadata and makes the file structure on the S3 bucket compatible with Amazon Athena and AWS Glue requirements.

  3. A Bucket Policy controls which accounts can replicate data to the destination bucket.

  4. AWS Glue Crawler runs every midnight UTC to update the partitions of the table definition in AWS Glue Data Catalog.

  5. Amazon QuickSight pulls data from Amazon Athena to its SPICE (Super-fast, Parallel, In-memory Calculation Engine).

  6. When collecting data exports for Linked accounts (not for Management Accounts), you may also want to collect data exports for the Data Collection account itself. In this case, specify the Data Collection account as the first in the list of Source Accounts. Replication is still required to remove metadata.

  7. Athena's reading process can be affected by writing operations. When replication arrives, it might fail to update datasets, especially with high volumes of data. In such cases, consider scheduling temporary disabling and re-enabling of the Amazon S3 bucket policy that allows replication. Since exports typically arrive three times daily, this temporary deactivation has minimal side effects.

  8. Some customers might need to store data exports to secondary destinations for archiving or reporting at a higher organizational level or to staging environment. You can specify a secondary bucket to receive the data in these cases.

Using Secondary Replication Bucket

There can be various situations where customers need to replicate data exports to multiple destinations. One common scenario involves a Business Unit requiring exports for one or more AWS Organizations data while simultaneously allowing Headquarters to access these same exports data for a consolidated view across all business units.

To accomplish this, both the Headquarters and Business Unit can implement separate data export destination stacks. Business Unit administrators, working from their management account, can specify a target bucket located within the Headquarters stack, enabling seamless data replication to both locations.

Other scenario can be a replicating data to a staging environment.

Secondary Replication Bucket

Legacy Cost and Usage Report

Legacy AWS Cost and Usage Reports (Legacy CUR) can still be used for Cloud Intelligence Dashboards and other use cases.

The CID project provides a CloudFormation template for Legacy CUR. Unlike the Data Exports CloudFormation template, it does not provide AWS Glue tables. You can use this template to replicate CUR and aggregate CUR from multiple source accounts (Management or Linked).

Basic Architecture of CUR

Check code here: cur-aggregation.yaml

FAQ

Why replicate data instead of providing cross-account access?

Cross-account access is possible but can be difficult to maintain, considering the many different roles that require this access, especially when dealing with multiple accounts.

We only have one AWS Organization. Do we still need this?

Yes. Throughout an organization's lifecycle, mergers and acquisitions may occur, so this approach prepares you for potential future scenarios.