AWS لیک فارمیشن کا استعمال کرتے ہوئے مؤثر ڈیٹا لیکس، حصہ 5: قطار کی سطح تک رسائی کے کنٹرول کے ساتھ ڈیٹا لیکس کو محفوظ بنانا

ماخذ نوڈ: 1859355

Increasingly, customers are looking at data lakes as a core part of their strategy to democratize data access across the organization. Data lakes enable you to handle petabytes and exabytes of data coming from a multitude of sources in varying formats, and gives users the ability to access it from their choice of analytics and machine learning tools. Fine-grained access controls are needed to ensure data is protected and access is granted to only those who require it.

AWS جھیل کی تشکیل is a fully managed service that helps you build, secure, and manage data lakes, and provide access control for data in the data lake. Lake Formation row-level permissions allow you to restrict access to specific rows based on data compliance and governance policies. Lake Formation also provides centralized auditing and compliance reporting by identifying which principals accessed what data, when, and through which services.

This post demonstrates how row-level access controls work in Lake Formation, and how to set them up.

If you have large fact tables storing billions of records, you need a way to enable different users and teams to access only the data they’re allowed to see. Row-level access control is a simple and performant way to protect data, while giving users access to the data they need to perform their job. In the retail industry for instance, you may want individual departments to only see their own transactions, but allow regional managers access to transactions from every department.

Traditionally you can achieve row-level access control in a data lake through two common approaches:

  • Duplicate the data, redact sensitive information, and grant coarse-grained permissions on the redacted dataset
  • Load data into a database or a data warehouse, create a view with a WHERE clause to select only specific records, and grant permission on the resulting view

These solutions work well when dealing with a small number of tables, principals, and permissions. However, they make it difficult to audit and maintain because access controls are spread across multiple systems and methods. To make it easier to manage and enforce fine-grained access controls in a data lake, we announced a preview of Lake Formation row-level access controls. With this preview feature, you can create row-level filters and attach them to tables to restrict access to data for AWS شناخت اور رسائی کا انتظام (IAM) and SAMLv2 federated identities.

How data filters work for row-level security

Granting permissions on a table with row-level security (row filtering) restricts access to only specific rows in the table. The filtering is based on the values of one or more columns. For example, a salesperson analyzing sales opportunities should only be allowed to see those opportunities in their assigned territory and not others. We can define row-level filters to restrict access where the value of the territory column matches the assigned territory of the user.

With row-level security, we introduced the concept of ڈیٹا فلٹرز. Data filters make it simpler to manage and assign a large number of fine-grained permissions. You can specify the row filter expression using the WHERE clause syntax described in the پارٹی کیو ایل بولی

مثال استعمال کیس

In this post, a fictional ecommerce company sells many different products, like books, videos, and toys. Customers can leave reviews and star ratings for each product, so other customers can make informed decisions about what they should buy. We use the ایمیزون کسٹمر ریویو ڈیٹاسیٹ, which includes different products and customer reviews.

To illustrate the different roles and responsibilities of a data owner and a data consumer, we assume two personas: a data lake administrator and a data analyst. The administrator is responsible for setting up the data lake, creating data filters, and granting permissions to data analysts. Data analysts residing in different countries (for our use case, the US and Japan) can only analyze product reviews for customers located in their own country and for compliance reasons, shouldn’t be able to see data for customers located in other countries. We have two data analysts: one responsible for the US marketplace and another for the Japanese marketplace. Each analyst uses ایمیزون ایتینا to analyze customer reviews for their specific marketplace only.

AWS CloudFormation کے ساتھ وسائل مرتب کریں۔

اس پوسٹ میں شامل ہے۔ AWS کلاؤڈ فارمیشن فوری سیٹ اپ کے لیے ٹیمپلیٹ۔ آپ اپنی ضروریات کے مطابق اس کا جائزہ لے سکتے ہیں اور اسے اپنی مرضی کے مطابق بنا سکتے ہیں۔

CloudFormation ٹیمپلیٹ درج ذیل وسائل تیار کرتا ہے:

  • An او ڈبلیو ایس لامبڈا۔ function (for Lambda-backed AWS CloudFormation custom resources). We use the function to copy sample data files from the public S3 bucket to your ایمیزون سادہ اسٹوریج سروس (ایمیزون S3) بالٹی۔
  • ہماری ڈیٹا جھیل کے طور پر کام کرنے کے لیے ایک S3 بالٹی۔
  • IAM صارفین اور پالیسیاں:
    • DataLakeAdmin
    • DataAnalystUS
    • DataAnalystJP
  • An AWS گلو Data Catalog database, table, and partition.
  • Lake Formation data lake settings and permissions.

When following the steps in this section, use either us-east-1 or us-west-2 Regions (where the preview functionality is currently available).

Before launching the CloudFormation template, you need to ensure that you disabled Use only IAM access control for new databases/tables مندرجہ ذیل اقدامات سے:

  1. Sign in to the Lake Formation console in the us-east-1 or us-west-2 علاقہ
  2. کے تحت ڈیٹا کیٹلاگمنتخب کریں ترتیبات.
  3. غیر منتخب کریں نئے ڈیٹا بیس کے لیے صرف IAM ایکسیس کنٹرول استعمال کریں۔ اور نئے ڈیٹا بیس میں نئے ٹیبلز کے لیے صرف IAM ایکسیس کنٹرول استعمال کریں۔.
  4. میں سے انتخاب کریں محفوظ کریں.

CloudFormation اسٹیک کو لانچ کرنے کے لیے، درج ذیل مراحل کو مکمل کریں:

  1. Sign in to the CloudFormation console in the same Region.
  2. میں سے انتخاب کریں لانچ اسٹیک:
  3. میں سے انتخاب کریں اگلے.
  4. کے لئے DatalakeAdminUserName اور DatalakeAdminUserPassword، ڈیٹا لیک ایڈمن IAM صارف کے لیے مطلوبہ صارف نام اور پاس ورڈ درج کریں۔
  5. کے لئے DataAnalystUsUserName اور DataAnalystUsUserPassword, enter the user name and password you want for the data analyst user who is responsible for the US marketplace.
  6. کے لئے DataAnalystJpUserName اور DataAnalystJpUserPassword, enter the user name and password you want for the data analyst user who is responsible for the Japanese marketplace.
  7. کے لئے DataLakeBucketName، اپنے ڈیٹا لیک بالٹی کا نام درج کریں۔
  8. کے لئے ڈیٹا بیس کا نام اور ٹیبل نام، بطور ڈیفالٹ چھوڑ دیں۔
  9. میں سے انتخاب کریں اگلے.
  10. اگلے صفحے پر، منتخب کریں۔ اگلے.
  11. حتمی صفحہ پر تفصیلات کا جائزہ لیں اور منتخب کریں۔ میں تسلیم کرتا ہوں کہ AWS CloudFormation IAM وسائل پیدا کر سکتا ہے۔.
  12. میں سے انتخاب کریں تخلیق کریں.

اسٹیک بنانے میں تقریباً 1 منٹ لگ سکتا ہے۔

Query without data filters

After you set up the environment, you can query the product reviews table. Let’s first query the table without row-level access controls to make sure we can see the data. If you’re running queries in Athena for the first time, you need to استفسار کے نتیجے کے مقام کو ترتیب دیں۔.

Sign in to the Athena console using the DatalakeAdmin user, and run the following query:

SELECT * FROM lakeformation_tutorial_row_security.amazon_reviews
LIMIT 10

The following screenshot shows the query result. This table has only one partition, product_category=Video, so each record is a review comment for a video product.

Let’s run an aggregation query to retrieve the total number of records per marketplace:

SELECT marketplace, count(*) as total_count
FROM lakeformation_tutorial_row_security.amazon_reviews
GROUP BY marketplace

The following screenshot shows the query result. The marketplace column has five different values. In the subsequent steps, we set up row-based filters using the marketplace کالم.

Set up data filters

Let’s start by creating two different data filters, one for the analyst responsible for the US marketplace, and another for the one responsible for the Japanese marketplace. The we grant the users their respective permissions.

Create a filter for the US marketplace data

Let’s first set up a filter for the US marketplace data.

  1. جیسا کہ DatalakeAdmin user, open the Lake Formation console.
  2. میں سے انتخاب کریں ڈیٹا فلٹرز.
  3. میں سے انتخاب کریں نیا فلٹر بنائیں.
  4. کے لئے ڈیٹا فلٹر کا نام، داخل کریں amazon_reviews_US.
  5. کے لئے ٹارگٹ ڈیٹا بیس، ڈیٹا بیس کا انتخاب کریں۔ lakeformation_tutorial_row_security.
  6. کے لئے ہدف کی میز، میز کا انتخاب کریں۔ amazon_reviews.
  7. کے لئے کالم کی سطح تک رسائی، بطور ڈیفالٹ چھوڑ دیں۔
  8. کے لئے قطار فلٹر کا اظہار، داخل کریں marketplace='US'.
  9. میں سے انتخاب کریں فلٹر بنائیں.

Create a filter for the Japanese marketplace data

Let’s create another data filter to restrict access to the Japanese marketplace data.

  1. پر ڈیٹا فلٹرز صفحہ، منتخب کریں نیا فلٹر بنائیں.
  2. کے لئے ڈیٹا فلٹر کا نام، داخل کریں amazon_reviews_JP.
  3. کے لئے ٹارگٹ ڈیٹا بیس، ڈیٹا بیس کا انتخاب کریں۔ lakeformation_tutorial_row_security.
  4. کے لئے ہدف کی میز، میز کا انتخاب کریں۔ amazon_reviews.
  5. کے لئے کالم کی سطح تک رسائی، بطور ڈیفالٹ چھوڑ دیں۔
  6. کے لئے قطار فلٹر کا اظہار، داخل کریں marketplace='JP'.
  7. میں سے انتخاب کریں فلٹر بنائیں.

Grant permissions to the US data analyst

Now we have two data filters. Next, we need to grant permissions using these data filters to our analysts. We start by granting permissions to the DataAnalystUS صارف.

  1. پر ڈیٹا کی اجازت صفحہ, کا انتخاب گرانٹ.
  2. کے لئے پرنسپلمنتخب کریں IAM صارفین اور کردار، اور صارف کا انتخاب کریں۔ DataAnalystUS.
  3. کے لئے پالیسی ٹیگز یا کیٹلاگ وسائلمنتخب کریں نامزد ڈیٹا کیٹلاگ وسائل.
  4. کے لئے ڈیٹا بیس، ڈیٹا بیس کا انتخاب کریں۔ lakeformation_tutorial_row_security.
  5. کے لئے ٹیبل، میز کا انتخاب کریں۔ amazon_reviews.
  6. کے لئے ٹیبل کی اجازتمنتخب منتخب کریں.
  7. کے لئے ڈیٹا کی اجازتمنتخب Advanced cell-level filters.
  8. فلٹر کو منتخب کریں۔ amazon_reviews_US.
  9. میں سے انتخاب کریں گرانٹ.

The following screenshot show the available data filters you can attach to a table when configuring permissions.

Grant permissions to the Japanese data analyst

Next, complete the following steps to configure permissions for the user DataAnalystJP:

  1. پر ڈیٹا کی اجازت صفحہ, کا انتخاب گرانٹ.
  2. کے لئے پرنسپلمنتخب کریں IAM صارفین اور کردار، اور صارف کا انتخاب کریں۔ DataAnalystJP.
  3. کے لئے پالیسی ٹیگز یا کیٹلاگ وسائلمنتخب کریں نامزد ڈیٹا کیٹلاگ وسائل.
  4. کے لئے ڈیٹا بیس، ڈیٹا بیس کا انتخاب کریں۔ lakeformation_tutorial_row_security.
  5. کے لئے ٹیبل، میز کا انتخاب کریں۔ amazon_reviews.
  6. کے لئے ٹیبل کی اجازتمنتخب منتخب کریں.
  7. کے لئے ڈیٹا کی اجازتمنتخب Advanced cell-level filters.
  8. فلٹر کو منتخب کریں۔ amazon_reviews_JP.
  9. میں سے انتخاب کریں گرانٹ.

Query with data filters

With the data filters attached to the product reviews table, we’re ready to run some queries and see how permissions are enforced by Lake Formation. Because row-level security is in preview as of this writing, we need to create a special Athena workgroup named AmazonAthenaLakeFormationPreview, and switch to using it. For more information, see Managing Workgroups.

Sign in to the Athena console using the DataAnalystUS user and switch to the AmazonAthenaLakeFormationPreview workgroup. Run the following query to retrieve a few records, which are filtered based on the row-level permissions we defined:

SELECT * FROM lakeformation.lakeformation_tutorial_row_security.amazon_reviews
LIMIT 10

Note the prefix of lakeformation. before the database name; this is required for the preview only.

درج ذیل اسکرین شاٹ استفسار کا نتیجہ دکھاتا ہے۔

Similarly, run a query to count the total number of records per marketplace:

SELECT marketplace, count(*) as total_count
FROM lakeformation.lakeformation_tutorial_row_security.amazon_reviews
GROUP BY marketplace 

The following screenshot shows the query result. Only the marketplace US shows in the results. This is because our user is only allowed to see rows where the marketplace column value is equal to US.

سوئچ کریں DataAnalystJP user and run the same query:

SELECT * FROM lakeformation.lakeformation_tutorial_row_security.amazon_reviews
LIMIT 10

The following screenshot shows the query result. All of the records belong to the JP مارکیٹ.

Run the query to count the total number of records per marketplace:

SELECT marketplace, count(*) as total_count
FROM lakeformation.lakeformation_tutorial_row_security.amazon_reviews
GROUP BY marketplace

The following screenshot shows the query result. Again, only the row belonging to the JP marketplace is returned.

صاف کرو

اب آخری مرحلے پر، وسائل کو صاف کرنا۔

  1. CloudFormation اسٹیک کو حذف کریں۔.
  2. ایتھینا ورک گروپ کو حذف کریں۔ AmazonAthenaLakeFormationPreview.

نتیجہ

In this post, we covered how row-level security in Lake Formation enables you to control data access without needing to duplicate it or manage complicated alternatives such as views. We demonstrated how Lake Formation data filters can make creating, managing, and enforcing row-level permissions simple and easy.

When you want to grant permission on specific cell, you can include or exclude columns in the data filters in addition to the row filter expression. You can learn more about the cell filters in Part 4: Implementing cell-level and row-level security.

You can get started with Lake Formation today by visiting the AWS Lake Formation product page. If you want to try out row-level security, as well as the other exciting new features like ACID transactions and acceleration currently available for preview in the US East (N. Virginia) and the US West (Oregon) Regions, پیش نظارہ کے لئے سائن اپ کریں۔.


مصنفین کے بارے میں

نوریٹاکا سیکیاما is a Senior Big Data Architect on the AWS Glue and AWS Lake Formation team. He has 11 years of experience working in the software industry. Based in Tokyo, Japan, he is responsible for implementing software artifacts, building libraries, troubleshooting complex issues and helping guide customer architectures.

سنجے سریواستو is a Principal Product Manager for AWS Lake Formation. He is passionate about building products, in particular products that help customers get more out of their data. During his spare time, he loves to spend time with his family and engage in outdoor activities including hiking, running, and gardening.
 

Source: https://aws.amazon.com/blogs/big-data/effective-data-lakes-using-aws-lake-formation-part-5-secure-data-lakes-with-row-level-access-control/

ٹائم اسٹیمپ:

سے زیادہ AWS