Adding Sentry to Kerberized Hadoop Cluster

As part of this module we will see how to add sentry to Kerberized Hadoop Cluster to restrict access to Hive Databases for different types of users.

  • Add Sentry to the Cluster
  • Enable Sentry for Hive and Hue
  • Block Hive CLI Access
  • Make sure users can connect using beeline
  • Manage permissions to Hive Databases using Sentry

Sentry primarily have 2 components.

  • Sentry Server (a web service which will serve clients like Hive, Hue etc)
  • Database (to store information related to roles and permissions)

In Production environment we might have to consider HA configuration of Sentry. But for this module we will see stand alone configuration where Sentry Server is deployed in one of the masters of the cluster.

Create a Database for Sentry Server

Let us create a database for Sentry Server before adding service using Cloudera Manager.

  • We already have a MySQL Server running on bigdataserver-1
  • Let’s create a Database for Sentry Server (login as root to MySQL Server)
GRANT ALL ON sentry.* TO sentry IDENTIFIED BY 'itversity';

Add Service using Cloudera Manager

Let us understand how to add Sentry Server using Cloudera Manager on an existing cluster.

  • Go to the cluster and click on the drop down
  • Click on Add Service
  • Choose Sentry
  • Enter Database Details and Verify Connection
  • Make sure to restart all impacted services

Connecting to Hive using beeline

Let us understand how we can connect to Hive using beeline in Kerberized environment.

  • Once we login and switch to relevant user we need to ensure that there is a valid Kerberos ticket. If not, we need to generate a ticket.
kinit -k -t /home/retail/keytabs/retail.keytab \
  • Once the ticket is generated we can launch beeline and then connect using Hive thrift server. We also need to pass Kerberos principal details related to HiveServer2 Service Principal for corresponding host.
!connect jdbc:hive2://bigdataserver-4.c.itversitydiscuss.internal:10000/retail;principal=hive/bigdataserver-4.c.itversitydiscuss.internal@ITVERSITY.COM;auth=kerberos

Now we can connect to appropriate database and run queries.

USE retail;
SELECT count(1) FROM orders;
  • Let us also review the user using which the Map Reduce Jobs are submitted by going to corresponding tracking URL.
  • On GCP, we might have to use sshuttle to access the tracking URL. Also /etc/hosts should be updated mapping to private ips.
  • Typically, the jobs will be submitted using the OS User used to run Hive Queries, in this case it is nothing but retail. This process of using OS User id to run Map Reduce jobs for Hive Queries is called impersonation. It is by default enabled.

Disable impersonation

We need to disable impersonation in HiveServer2 so that Sentry can take care of authentication and authorization of users. If the impersonation is enabled, then the users might be able to access the files directly bypassing sentry.

  • Go to the Hive service-> Configuration tab.
  • Select Scope > HiveServer2 & Category > Main.
  • Uncheck the HiveServer2 Enable Impersonation checkbox.
  • Click Save Changes to commit the changes.

Now all the queries will trigger map reduce jobs using Hive as owner. If we enable Sentry, then only users with right privileges will be able to run queries against Hive Tables.

Enable ACLs on HDFS

We need to have ACLs enabled to work in an environment where Sentry is configured.

  • Go to HDFS Configuration and search for “Access Control Lists”
  • Enable and restart
  • Also validate whether ACLs can be applied on HDFS folders by logging into gateway node of the cluster.
hadoop fs -mkdir /user/itversity/acls_demo
hadoop fs -setfacl -m user:dgadiraju:rwx /user/itversity/acls_demo
hadoop fs -getfacl /user/itversity/acls_demo

Enable the Hive user to submit YARN jobs

Let’s make sure that Hive User can submit YARN jobs.

  • Go to the YARN service -> Configuration tab.
  • Select Scope > NodeManager & Category > Security.
  • Ensure the Allowed System Users property includes the hive user. If not, add hive.
  • Click Save Changes to commit the changes.
  • Repeat steps 1-6 for every Node Manager role group for the YARN service that is associated with Hive.
  • Restart the YARN service.

As our cluster only have one role group with respect to Node Managers, we took care of this only once.

Block the Hive CLI Access

Once we install Sentry if we want to enforce the authentication and authorization over Hive Databases and Tables using Sentry then we need to block access using Hive CLI for all the regular users.

Here are the details related to blocking access to Hive CLI.

  • If we block access to Hive Metastore then Hive CLI is not accessible.
  • hadoop.proxyuser.hive.groups is the property which can be used to limit access to Hive Metastore to limited users.
  • We can set groups related to system users like hive, hue and sentry so that access to Hive Metastore is restricted to the users belonging to these groups.

Here are the steps to restrict the access to Hive Metastore for users belonging to System Groups.

  • Go to Hive service -> Configuration tab.
  • Locate the hadoop.proxyuser.hive.groups parameter and click the plus sign.
  • Enter hive into the text box and click the plus sign again.
  • Enter hue into the text box and click the plus sign again.
  • Enter sentry into the text box and continue if you want to add more groups.
  • Click Save Changes.

All the users other than those belonging to groups such as hive, hue, sentry etc can now access Hive only through tools like beeline. Hive CLI will not work for them any more.

Enable Stored Notifications in Database

We need to enable stored notifications in Database as part of Hive.

  • Go to Configuration and search for Stored Notifications
  • Make sure it is enabled.

Enabling the Sentry Service for Hive

We need to ensure that Sentry Service is enabled for Hive.

  • Go to the Hive service.
  • Click the Configuration tab.
  • Select Scope > Hive (Service-Wide).
  • Select Category > Main.
  • Locate the Sentry Service property and select Sentry.
  • Click Save Changes to commit the changes.
  • Restart the Hive service.

Add user sadmin as Sentry Admin

We need to add users as Sentry Admins so that using those users one can manage roles, grant permissions, assign roles to users etc.

  • Add user whom you want to make as Sentry admin to manage roles (sadmin)
  • We need to add user sadmin on all nodes in the cluster.
ansible all \
  -i hosts \
  -m user \
  -a "name=sadmin" \
  --become \
  • Generate keytab file for the user principal sadmin in kerberized cluster as it is functional user.
addprinc -randkey sadmin/bigdataserver-1.c.itversitydiscuss.internal
xst -k /home/sadmin/keytabs/sadmin.keytab sadmin/bigdataserver-1.c.itversitydiscuss.internal
  • Go to Sentry Configuration using Cloudera Manager and then update with user sadmin
  • Redeploy and Restart affected services.
  • Connect to beeline

Overview of Sentry Admin Commands

We need to understand all the relevant commands which can be used to manage roles, grant permissions and assign roles to users. Let us validate commands by granting all the permissions to user sadmin.

  • Create role - admin
  • Grant permissions on all databases to the admin role
  • Assign admin role to the user sadmin
  • Validate by creating a database, a table and by running query against the table.
GRANT ROLE admin TO GROUP sadmin;

SELECT count(1) FROM t;
DROP DATABASE testing_db;

Enabling the Sentry Service for Hue

Every Hue user connecting to Sentry must have an equivalent OS-level user account on all hosts so that Sentry can authenticate Hue users. Each OS-level user should also be part of the OS-level group with the same name as the user.

  • Enable the Sentry service as follows:
  • Enable the Sentry service for Hive and Impala (as instructed above).
  • Go to the Hue service.
  • Click the Configuration tab.
  • Select Scope > Hue (Service-Wide).
  • Select Category > Main.
  • Locate the Sentry Service property and select Sentry.
  • Click Save Changes to commit the changes.
  • Restart Hue.

Creating Hue account for User retail

Let us understand the steps involved in creating Hue account for User retail.

  • We need to ensure that there is an OS level user for retail.
  • Also we need to ensure that an OS level group is created using the same name as user (retail).
  • Go to Hue and create an account with name retail.
  • As roles are not assigned, user retail will not be able to perform any operations.
  • Login as sadmin user then perform following tasks so that user retail have permissions only to the database retail_db.
    • Create database by name retail_db
    • Create a role by name retail_admin
    • Grant all permissions on retail_db database to retail_admin
    • Assign retail_admin to retail
    • Make sure retail can manage tables in database retail_db.
CREATE ROLE retail_admin;
GRANT ALL ON retail_db TO ROLE retail_admin;
GRANT ROLE retail_admin TO GROUP retail;
GRANT ALL ON URI 'hdfs://nameserver1/user/retail'
  TO ROLE retail_admin;
GRANT ROLE retail_admin TO GROUP retail;
  • We need to update ACLs on the files from which data need to be copied
hadoop fs -setfacl -m -R default:user:hive:rwx /user/retail
hadoop fs -setfacl -m -R user:hive:rwx /user/retail
  • Upload the data into HDFS using Hue or by logging into Gateway Node.
  • Now login as retail user and try to create orders table, load data and run queries against the table.
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING

LOAD DATA INPATH '/user/retail/retail_db/orders'
INTO TABLE orders;

SELECT * FROM orders LIMIT 10;