Site icon Digital Alpha

Customer and Competitor Intelligence from SEC Filings with AWS Data Lake

A US-based telecommunication, networking, and software services company wanted to build a corporate research platform that provides faster and more comprehensive customer and competitor insights to drive sales and strategy. So, it can create business intelligence (BI) dashboards on top of it for decision-making. Data sources like SEC filings and investor presentations are some input data feeds that will power the research platform. Finally, it requires the output data to be published to Snowflake.

The Challenge

Handle Structured and Unstructured Data: SEC filings are structured formats such as XBRL and semi/unstructured formats – tables and free-form text.

Design for Flexibility, Simplicity, and Scale: Set up flexible infrastructure and data pipeline automation that can handle layers of complexity in the data processing.

Support Complexity and Custom Analytics: Allow analytics and aggregations at different stages of data processing and publish results to distribution channels such as Snowflake, Kafka, and S3 or last-mile tools such as Excel and Dashboards.

Why Data and Analytics on AWS?

Leverage AWS to deliver services that provide virtually everything needed to quickly and easily build and manage data analytics solutions. These secure and easy to use managed services enable customers to deploy virtually any big data application on AWS.

The Solution

Built a solution that automates the provisioning of data lake infrastructure resources and deploys ETL jobs interactively. Data Stack is automated with AWS CDK and Step functions workflows. Results are published to client Snowflake Data Share, from which they had lot of options for data crunching and visualization.

Extracting facts from the XBRL documents:

XBRL, which stands for eXtensible Business Reporting Language, is an XML based language for tagging financial data and enabling businesses to efficiently and accurately process and share their data. XBRL instances contain the information that is being exchanged. That information is expressed in the form of facts. Each fact is associated with a concept from an XBRL taxonomy, which expresses the concept and defines it.

Understanding and processing XBRL filings has few moving parts and is a bit involved when it comes to extracting insights for analysis, audit or investment research purposes.

Data Lake Infrastructure

The data lake infrastructure provisioning includes Amazon S3 buckets, S3 bucket policies, AWS Key Management Service (KMS) encryption keys, Amazon Virtual Private Cloud (Amazon VPC), subnets, route tables, security groups, VPC endpoints, and secrets in AWS Secrets Manager.

Data Lake ETL Jobs

In ETL jobs, the SEC filings are processed to produce the output. The following figure displays ETL process that involves four ETL jobs within a Step Functions state machine.

Centralized Deployment

Deployment model is based on the following design principles:

A dedicated AWS account to run CDK pipelines.
One or more AWS accounts into which the data lake is deployed.
The data lake infrastructure has a dedicated source code repository. Typically, data lake infrastructure is a one-time deployment and but can evolve over time. Therefore, a dedicated code repository provides a landing zone for your data lake.
Each ETL job has a dedicated source code repository. Each ETL job may have unique AWS service, orchestration, and configuration requirements. Therefore, a dedicated source code repository will help you more flexibly build, deploy, and maintain ETL jobs.

Distribution via Integration Touch Points – Snowflake 

With Secure Data Sharing, no actual data is copied or transferred between accounts. All sharing is accomplished through Snowflake’s unique services layer and metadata store.

Transformed data is stored in Amazon S3 to be used for ingestion to Snowflake.

Results and Benefits

Digital Alpha helped the client consume raw XBRL and non-XBRL data from customers’ and competitors’ SEC reports, making critical business decisions and bolstering competitive insights.

The proprietary automated solution helped them convert structured and unstructured data to a readable and machine-processable JSON format, eventually publishing results to Snowflake data share. They utilized a fully automated solution to perform various analytics, run quant models, and conduct slicing and dicing of available data.

Finally, they established a data pipeline that removed human intervention, eliminated manual errors, and achieved deployment at speed and scale.

The following list encompasses some advantages:

The flexibility of Digital Alpha’s solution has also provided much greater speed and agility, opening up the possibility of real-time competitive intelligence monitoring for the client.

To sum up, Digital Alpha expands the utility of data for the investment management industry through AWS and Snowflake. Client team was able to gain valuable insights into their customers and competitors.

 

 

 

 

Exit mobile version