A company uses Amazon Redshift as its data warehouse. A new table has columns that contain sensitive data. The data in the table will eventually be referenced by several existing queries that run many times a day.
A data analyst needs to load 100 billion rows of data into the new table. Before doing so, the data analyst must ensure that only members of the auditing group can read the columns containing sensitive data.
How can the data analyst meet these requirements with the lowest maintenance overhead?
A. Load all the data into the new table and grant the auditing group permission to read from the table. Load all the data except for the columns containing sensitive data into a second table. Grant the appropriate users read-only permissions to the second table.
B. Load all the data into the new table and grant the auditing group permission to read from the table. Use the GRANT SQL command to allow read-only access to a subset of columns to the appropriate users.
C. Load all the data into the new table and grant all users read-only permissions to non-sensitive columns. Attach an IAM policy to the auditing group with explicit ALLOW access to the sensitive data columns.
D. Load all the data into the new table and grant the auditing group permission to read from the table. Create a view of the new table that contains all the columns, except for those considered sensitive, and grant the appropriate users read-only permissions to the table.
A central government organization is collecting events from various internal applications using Amazon Managed Streaming for Apache Kafka (Amazon MSK). The organization has configured a separate Kafka topic for each application to separate the data. For security reasons, the Kafka cluster has been configured to only allow TLS encrypted data and it encrypts the data at rest.
A recent application update showed that one of the applications was configured incorrectly, resulting in writing data to a Kafka topic that belongs to another application. This resulted in multiple errors in the analytics pipeline as data from different applications appeared on the same topic. After this incident, the organization wants to prevent applications from writing to a topic different than the one they should write to.
Which solution meets these requirements with the least amount of effort?
A. Create a different Amazon EC2 security group for each application. Configure each security group to have access to a specific topic in the Amazon MSK cluster. Attach the security group to each application based on the topic that the applications should read and write to.
B. Install Kafka Connect on each application instance and configure each Kafka Connect instance to write to a specific topic only.
C. Use Kafka ACLs and configure read and write permissions for each topic. Use the distinguished name of the clients' TLS certificates as the principal of the ACL.
D. Create a different Amazon EC2 security group for each application. Create an Amazon MSK cluster and Kafka topic for each application. Configure each security group to have access to the specific cluster.
A transport company wants to track vehicular movements by capturing geolocation records. The records are 10 B in size and up to 10,000 records are captured each second. Data transmission delays of a few minutes are acceptable, considering unreliable network conditions. The transport company decided to use Amazon Kinesis Data Streams to ingest the data. The company is looking for a reliable mechanism to send data to Kinesis Data Streams while maximizing the throughput efficiency of the Kinesis shards.
Which solution will meet the company's requirements?
A. Kinesis Agent
B. Kinesis Producer Library (KPL)
C. Kinesis Data Firehose
D. Kinesis SDK
A company has an encrypted Amazon Redshift cluster. The company recently enabled Amazon Redshift audit logs and needs to ensure that the audit logs are also encrypted at rest. The logs are retained for 1 year. The auditor queries the logs once a month.
What is the MOST cost-effective way to meet these requirements?
A. Encrypt the Amazon S3 bucket where the logs are stored by using AWS Key Management Service (AWS KMS). Copy the data into the Amazon Redshift cluster from Amazon S3 on a daily basis. Query the data as required.
B. Disable encryption on the Amazon Redshift cluster, configure audit logging, and encrypt the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query the data as required.
C. Enable default encryption on the Amazon S3 bucket where the logs are stored by using AES-256 encryption. Copy the data into the Amazon Redshift cluster from Amazon S3 on a daily basis. Query the data as required.
D. Enable default encryption on the Amazon S3 bucket where the logs are stored by using AES-256 encryption. Use Amazon Redshift Spectrum to query the data as required.
A data architect is building an Amazon S3 data lake for a bank. The goal is to provide a single data repository for customer data needs, such as personalized recommendations. The bank uses Amazon Kinesis Data Firehose to ingest customers' personal information bank accounts, and transactions in near-real time from a transactional relational database. The bank requires all personally identifiable information (PII) that is stored in the AWS Cloud to be masked.
Which solution will meet these requirements?
A. Invoke an AWS Lambda function from Kinesis Data Firehose to mask PII before delivering the data into Amazon S3.
B. Use Amazon Made, and configure it to discover and mask PII.
C. Enable server-side encryption (SSE) in Amazon S3.
D. Invoke Amazon Comprehend from Kinesis Data Firehose to detect and mask PII before delivering the data into Amazon S3.
A financial services company is building a data lake solution on Amazon S3. The company plans to use analytics offerings from AWS to meet user needs for one-time querying and business intelligence reports. A portion of the columns will contain personally identifiable information (PII) Only authorized users should be able to see plaintext PII data.
What is the MOST operationally efficient solution that meets these requirements?
A. Define a bucket policy for each S3 bucket of the data lake to allow access to users who have authorization to see PII data. Catalog the data by using AWS Glue. Create two IAM roles. Attach a permissions policy with access to PII columns to one role. Attach a policy without these permissions to the other role.
B. Register the S3 locations with AWS Lake Formation. Create two IAM roles. Use Lake Formation data permissions to grant Select permissions to all of the columns for one role. Grant Select permissions to only columns that contain non-PII data for the other role.
C. Register the S3 locations with AWS Lake Formation. Create an AWS Glue job to create an ETL workflow that removes the PII columns from the data and creates a separate copy of the data in another data lake S3 bucket. Register the new S3 locations with Lake Formation. Grant users the permissions to each data lake data based on whether the users are authorized to see PII data.
D. Register the S3 locations with AWS Lake Formation. Create two IAM roles. Attach a permissions policy with access to PII columns to one role. Attach a policy without these permissions to the other role. For each downstream analytics service, use its native security functionality and the IAM roles to secure the PII data.
A machinery company wants to collect data from sensors. A data analytics specialist needs to implement a solution that aggregates the data in near-real time and saves the data to a persistent data store. The data must be stored in nested JSON format and must be queried from the data store with a latency of single-digit milliseconds.
Which solution will meet these requirements?
A. Use Amazon Kinesis Data Streams to receive the data from the sensors. Use Amazon Kinesis Data Analytics to read the stream, aggregate the data, and send the data to an AWS Lambda function. Configure the Lambda function to store the data in Amazon DynamoDB.
B. Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use Amazon Kinesis Data Analytics to aggregate the data. Use an AWS Lambda function to read the data from Kinesis Data Analytics and store the data in Amazon S3.
C. Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use an AWS Lambda function to aggregate the data during capture. Store the data from Kinesis Data Firehose in Amazon DynamoDB.
D. Use Amazon Kinesis Data Firehose to receive the data from the sensors. Use an AWS Lambda function to aggregate the data during capture. Store the data in Amazon S3.
A public sector organization ingests large datasets from various relational databases into an Amazon S3 data lake on a daily basis. Data analysts need a mechanism to profile the data and diagnose data quality issues after the data is ingested into Amazon S3. The solution should allow the data analysts to visualize and explore the data quality metrics through a user interface.
Which set of steps provide a solution that meets these requirements?
A. Create a new AWS Glue DataBrew dataset for each dataset in the S3 data lake. Create a new DataBrew project for each dataset. Create a profile job for each project and schedule it to run daily. Instruct the data analysts to explore the data quality metrics by using the DataBrew console.
B. Create a new AWS Glue ETL job that uses the Deequ Spark library for data validation and schedule the ETL job to run daily. Store the output of the ETL job within an S3 bucket. Instruct the data analysts to query and visualize the data quality metrics by using the Amazon Athena console.
C. Schedule an AWS Lambda function to run daily by using Amazon EventBridge (Amazon CloudWatch Events). Configure the Lambda function to test the data quality of each object and store the results in an S3 bucket. Create an Amazon QuickSight dashboard to query and visualize the results. Instruct the data analysts to explore the data quality metrics using QuickSight.
D. Schedule an AWS Step Functions workflow to run daily by using Amazon EventBridge (Amazon CloudWatch Events). Configure the steps by using AWS Lambda functions to perform the data quality checks and update the catalog tags in the AWS Glue Data Catalog with the results. Instruct the data analysts to explore the data quality metrics using the Data Catalog console.
A marketing company collects clickstream data The company sends the data to Amazon Kinesis Data Firehose and stores the data in Amazon S3 The company wants to build a series of dashboards that will be used by hundreds of users across different departments. The company will use Amazon QuickSight to develop these dashboards The company has limited resources and wants a solution that could scale and provide daily updates about clickstream activity.
Which combination of options will provide the MOST cost-effective solution? (Select TWO )
A. Use Amazon Redshift to store and query the clickstream data
B. Use QuickSight with a direct SQL query
C. Use Amazon Athena to query the clickstream data in Amazon S3
D. Use S3 analytics to query the clickstream data
E. Use the QuickSight SPICE engine with a daily refresh
A company's data science team is designing a shared dataset repository on a Windows server. The data repository will store a large amount of training data that the data science team commonly uses in its machine learning models. The data
scientists create a random number of new datasets each day.
The company needs a solution that provides persistent, scalable file storage and high levels of throughput and IOPS. The solution also must be highly available and must integrate with Active Directory for access control.
Which solution will meet these requirements with the LEAST development effort?
A. Store datasets as files in an Amazon EMR cluster. Set the Active Directory domain for authentication.
B. Store datasets as files in Amazon FSx for Windows File Server. Set the Active Directory domain for authentication.
C. Store datasets as tables in a multi-node Amazon Redshift cluster. Set the Active Directory domain for authentication.
D. Store datasets as global tables in Amazon DynamoDB. Build an application to integrate authentication with the Active Directory domain.