The investment firm wanted to build a custom crawl-scrape-parse-tag solution to help its clients with healthcare-related parsing/scraping and tagging of non-traditional sources such as discussion forums, social media, and sector-level databases. The parsed data needed to be uploaded to an AI platform to generate actionable insights.
The main challenge lies in analyzing data from multiple, disjointed, diverse sources such as discussion forums, Twitter handles, and sector-level databases. If the analysts needed support in order to add new keywords or Twitter handles and implementing changes to the code was a cumbersome process. The firm needed an effective, easy-to-use, one-stop solution using machine learning techniques to transform the information into actionable insights.
Digital Alpha deploys a solution that includes the following components:
- Leverage Amazon S3 to store and manage the raw and enriched social media data collected from various sources such as Twitter.
- Use Amazon Kinesis Data Firehose to stream the social media data into Amazon S3 in real-time and configure AWS Lambda functions to process the data using Amazon Translate and Amazon Comprehend.
- Use Amazon Translate to translate non-English tweets into English and Amazon Comprehend to perform natural language processing (NLP) on the tweets, including entity extraction and sentiment analysis.
- Use Amazon Glue and Amazon Athena to transform and analyze the data stored in Amazon S3, and create tables in the AWS Glue Data Catalog to organize the data for querying.
- Use Amazon QuickSight to visualize the data and insights from the social media analysis, enabling customers to understand customer sentiment and improve brand awareness.
- Deploy the solution using a serverless architecture and Amazon EC2 instances running in Amazon VPC to ensure reliability and scalability.
- Leverage the solution’s data lake and Amazon QuickSight dashboards to gain insights into customer conversations and deepen brand awareness.
Digital Alpha’s data platform enabled the firm to achieve the following:
- Data Collection: Data Points required for analytics were scraped, parsed, tagged, and uploaded to the AI platform and/or internal data store for analysis
- Self-Service: New keywords and Twitter handles were added with little or no support from the implementation team
- Scalable: Solution were used as a blueprint for onboarding new discussion forums and sector-level databases
- Deployment Automation: Implementing changes to code and pushing to production was easy and automated with CI/CD pipelines