Amazon Q Business - Data Sources Connectors



Data Source Connector is a technique for combining and modifying data from different data sources into a single container index. Amazon Q Business provides multiple data source connectors to help create smart generative AI solutions with minimum configuration.

This chapter provides an overview of data source connector features, its configuration, and information specific to your data source connector.

Data Sources Connectors Concepts

To understand the configuration of data source connectors, need to understand some specific terminology related to them.

  • Source and endpoint metadata: The data source configuration information is found in the source section of the console. If you use the API, you specify this information using the configuration parameter of the CreateDataSource operation. Different connection sources has configuration information depending upon data sources.
  • Authorization: Amazon Q Business contains connectors indexAccess Control list(ACL) that has information regarding user email address, group name for the local group, group name for the federated group.
  • Authentication: Amazon Q Business has AWS Secrets Manager secret that helps Amazon Q Business to authenticate access to your data source by data source access credentials provided by you.
  • Virtual private cloud: Amazon Q Business has Virtual Private Cloud that stores data sources or databases. You can use Amazon VPC with either the console or the Amazon Q Business API
  • Web proxy: is used to connect data source instance to all supported data sources for that you must provide the host name and port number.
  • IAM role: Data source connectors requires IAM role that has Authorization and Authentication.
  • Identity crawler: Amazon Q Business has identity crawling feature that enable it to crawl ACL information at the document level from supported data sources.
  • Sync scope: Amazon Q Business has Sync Scope feature to customize the content crawled and indexed by your data source connector.
  • Sync mode: Used to customize what content gets synced with your index when your data source content changes.
  • Sync run schedule: Amazon Q Business has Sync run schedule feature that enables to periodically sync your data source with your retriever on a custom schedule.
  • Field mappings: Used to map Amazon Q Business index fields with data source document attributes.

What is a document?

When you connect Amazon Q Business to a data source, what gets treated as a single 'document' depends on the type of connection you're using.

The following table outlines what each connector crawls as a document.

Data source connector Supports crawling Document definition
Adobe Experience Manager (Cloud and Server)
  • Assets
  • Pages
  • Each Asset is considered a single document.
  • Each Page is considered a single document.
Alfresco (Cloud and Server)
  • Files
  • Comments
  • Each File is considered a single document.
  • Each Comment is considered a single document.
Amazon FSx (Windows) Files Each File is considered a single document.
Amazon S3 Objects Each Object is considered a single document. Any object-name.metadata.json file and access control list (ACL) file is considered metadata for the object it is associated with and not treated as a separate document.
Amazon Q Business Web Crawler
  • Web pages
  • Attachments
  • Each Web page is considered a single document.
  • Each Attachment is considered a single document.
Amazon WorkDocs
  • Files
  • Comments
  • Each File is considered a single document.
  • Each Comment is considered a single document.
Box
  • Files
  • Tasks
  • Comments
  • Weblinks
  • Each File is considered a single document.
  • Each Task is considered a single document.
  • Each Comment is considered a single document.
  • Each Weblink is considered a single document.
Confluence (Cloud and Server)
  • Spaces
  • Pages
  • Blogs
  • Comments
  • Attachments
  • Each Space is considered a single document.
  • Each Page is considered a single document.
  • Each Blog is considered a single document.
  • Each Comment is considered a single document.
  • Each Attachment is considered a single document.
Database data sources
  • Aurora (MySQL)
  • Aurora (PostgreSQL)
  • Amazon RDS (Microsoft SQL Server)
  • Amazon RDS (MySQL)
  • Amazon RDS (Oracle)
  • Amazon RDS (PostgreSQL)
  • IBM DB2
  • PostgreSQL
  • Microsoft SQL Server
  • MySQL
  • Oracle Database
  • Table data in a single database
  • View data in a single database
  • Each row in a table and view is considered a single document.
Dropbox
  • Files
  • Papers
  • Paper templates
  • Shortcuts
  • Each File is considered a single document.
  • Each Paper is considered a single document.
  • Each Paper template is considered a single document.
  • Each Shortcut is considered a single document.
Drupal
  • Articles
  • Basic pages
  • Basic blocks
  • Custom content
  • Custom blocks
  • Comments on articles, basic pages, basic blocks, custom content, and custom blocks
  • Attachments in articles, basic pages, basic blocks, custom content, and custom blocks
  • Each Article is considered a single document.
  • Each Basic page is considered a single document.
  • Each Basic block is considered a single document.
  • Each Custom content is considered a single document.
  • Each Custom block is considered a single document.
  • Each Comment on an article, a basic page, a basic block, any custom content, and a custom block is considered a document.
  • Each Attachment in an article, a basic page, a basic block, any custom content, and a custom block is considered a document.
GitHub (Cloud and Server)
  • Respositories
  • Repository commits
  • Issues
  • Issue attachments
  • Issue comments
  • Pull request documents
  • Pull request comments
  • Pull request attachments
  • Each Repository is considered a single document.
  • Each Repository commit is considered a single document.
  • Each Issue is considered a single document.
  • Each Issue attachment is considered a single document.
  • Each Issue comment is considered a single document.
  • Each Pull request is considered a single document.
  • Each Pull request comment is considered a single document.
  • Each Pull request attachment is considered a single document.
Gmail
  • Emails
  • Email attachments
  • Each Email is considered a single document.
  • Each Email attachment is considered a single document.
Google Drive
  • Files
  • Comments
  • Each File is considered a single document.
  • Each Comment is considered a single document.
Jira
  • Projects
  • Issues
  • Comments
  • Attachments
  • Worklog
  • Each Project is considered a single document.
  • Each Comment is considered a single document.
  • Each Issue is considered a single document.
  • Each Comment is considered a single document.
  • Each Attachment is considered a single document.
  • Each Worklog is considered a single document
Microsoft Exchange
  • Emails
  • Attachments
  • Calendar
  • Contacts
  • Notes
  • OneNotes
  • Each Email is considered a single document.
  • Each Attachment is considered a single document.
  • Each Calendar is considered a single document.
  • Each Comment is considered a single document.
  • Each Contact is considered a single document.
  • Each Note is considered a single document.
  • Each page in OneNotes is considered a single document.
Microsoft OneDrive
  • Files
  • OneNotes
  • Each File is considered a single document.
  • Each page in OneNotes is considered a single document.
Microsoft SharePoint (Online and Server)
  • Events
  • Pages
  • Files
  • Links
  • File attachments
  • Comments
  • OneNotes
  • Each Event is considered a single document.
  • Each Page is considered a single document.
  • Each File is considered a single document.
  • Each Link is considered a single document.
  • Each File attachment is considered a single document.
  • Each Comment is considered a single document.
  • Each page in OneNotes is considered a single document.
Microsoft Teams
  • Chat messages
  • Chat attachments
  • Channel posts
  • Channel wikis
  • Channel attachments
  • Meeting chats
  • Meeting files
  • Meeting notes
  • Calendar meetings
  • OneNotes
  • Each Chat message is considered a single document.
  • Each Chat attachment is considered a single document.
  • Each Channel post is considered a single document.
  • Each Channel wiki is considered a single document.
  • Each Channel attachment is considered a single document.
  • Each Metting chat is considered a single document.
  • Each Meeting file is considered a single document.
  • Each Meeting note is considered a single document.
  • Each Calendar meeting is considered a single document.
  • Each page in OneNotes is considered a single document.
Microsoft Yammer
  • Communities
  • Attachments
  • Messages
  • Users
  • Each Community is considered a single document.
  • Each Attachment is considered a single document.
  • Each Message and community post is considered a single document.
  • Each User is considered a single document.
Quip
  • Files
  • Messages
  • Threads
  • Each File is considered a single document.
  • Each Comment is considered a single document.
  • Each file and message posted in a Thread is considered a single document.
Salesforce
  • Accounts
  • Contacts
  • Campaigns
  • Contracts
  • Cases
  • Partners
  • Opportunities
  • Groups
  • Leads
  • Users
  • Tasks
  • Ideas
  • Profiles
  • Solutions
  • Chatters
  • Documents
  • Custom entities
  • Knowledge articles
  • Each Account is considered a single document.
  • Each Contact is considered a single document.
  • Each Campaign is considered a single document.
  • Each Contract is considered a single document.
  • Each Case is considered a single document.
  • Each Partner is considered a single document.
  • Each Opportunity is considered a single document.
  • Each Group is considered a single document.
  • Each Lead is considered a single document.
  • Each User is considered a single document.
  • Each Task is considered a single document.
  • Each Idea is considered a single document.
  • Each Profile is considered a single document.
  • Each Solution is considered a single document.
  • Each Chatter is considered a single document.
  • Each Document (file) is considered a single document.
  • Each Custom entity (record) is considered a single document.
  • Each Knowledge article is considered a single document.
ServiceNow
  • Incidents
  • Knowledge articles
  • Service catalog
  • Attachments
  • Each Incident is considered a single document.
  • Each Knowledge article is considered a single document.
  • Each Service catalog is considered a single document.
  • Each Attachment is considered a single document.
Slack
  • Messages
  • Message attachments
  • Channel posts
  • Each Message is considered a single document.
  • Each Message attachment is considered a single document.
  • Each Channel post is considered a single document.
Zendesk
  • Tickets
  • Ticket comments
  • Ticket comment attachments
  • Articles
  • Article attachments
  • Article comments
  • Community topics
  • Community posts
  • Community post comments
  • Each Ticket is considered a single document.
  • Each Ticket comment is considered a single document.
  • Each Ticket comment attachment is considered a single document.
  • Each Article is considered a single document.
  • Each Article attachment is considered a single document.
  • Each Article comment is considered a single document.
  • Each Community topic is considered a single document.
  • Each Community post is considered a single document.
  • Each Community post comment is considered a single document.

Configuration Best Practices

The following list describes best practices for setting up and configuring your Amazon Q Business data source connector:

  • Each document in an index must be unique. Ensure there are no duplicate documents within a data source, or across any data sources, that you plan to connect to an Amazon Q Business retriever.
  • When changing authentication type or credentials, update the IAM role to access the correct AWS Secrets Manager secret ID.
  • For your own security, make sure to regularly update your credentials and secrets. Only give access to what is needed and don't reuse them across different data sources.
  • IAM roles used for data retrievers cannot be used for data sources. If you are unsure about the role's usage, create a new IAM role to prevent errors.
  • When using AWS KMS keys in your application, ensure that the IAM role for your application environment has the necessary permissions to describe, encrypt, and decrypt data using the key.
  • Amazon Q Business enhances security by using Secrets Manager to verify endpoint information used to access on-premises or server data sources, preventing the "confused deputy" problem where users without direct access might gain access indirectly through a proxy. Changes in endpoint creates a new secret in Secrets Manager to reflect the updated information.
  • Most data sources use regular expression patterns, which are inclusion or exclusion patterns referred to as filters.

Understanding User Store

Amazon Q Business has User Store feature that allows users to only see chat responses generated from documents they have access to within the application. This means that users can only see responses that are relevant to their permissions and the data they are authorized to view.

How the User Store works?

The following steps showing the working of Amazon Q Business User Store

  • In Amazon Q Business, each document in any data source has access control list (ACL) information inherently attached to it as metadata.
  • The ACLs contain information about which users and groups have access to a document.
  • Then Connectors can crawl and use ACL information from your data source.
  • And Re-sync your data source to capture ACL changes and ensure correct user access.
  • Amazon Q Business crawls user and group information from each data source and maps it internally.
  • Then User and group information is stored in the User Store for matching document access details.
  • If you delete a group in the User Store and then re-create it later with the same name but with different group members, document ACLs which contain this group may be impacted.
  • Delete the old user from the User Store if a new user has the same email address. Amazon Q Business will verify user attributes and deny access if there are discrepancies.

Using Amazon VPC

Amazon Q Business can connect to your Virtual private cloud (VPC) to index content. It can do this because you can tell Amazon Q Business the security information it needs to access your VPC. This way, Amazon Q Business can securely communicate with your data source within your virtual private cloud.

Troubleshooting Data Source Connectors

Now we are going to fix some issues with Amazon Q Business data source connectors.

  • My documents were not indexed: Amazon Q Business has a two-step process for indexing data. Errors can occur at either the data source level or at the document level. Data source errors are reported in the console, while document level errors are reported in Amazon CloudWatch Logs. This helps you identify and fix any issues that prevent documents from being indexed.
  • My synchronization job failed: Amazon Q Business synchronization jobs can fail due to configuration errors in the index or the data source. These errors are usually related to insufficient IAM permissions for Amazon Q Business to access the resources it needs. The error message in the Sync run history section of the data source details page provides details about the missing permissions. Following are some of the error messages that you can receive:
    • Failed to create log group for job. Please make sure that the IAM role provided has sufficient permissions.
    • Failed to access Amazon S3 file prefix (bucket name) while trying to crawl your metadata files. Please make sure the IAM role (ARN) provided has sufficient permissions.
    • The provided IAM role (ARN) could not be assumed. Please make sure Amazon Q Business is a trusted entity that is allowed to assume the role.
  • My synchronization job is incomplete: To troubleshoot an incomplete synchronization job, look first to your CloudWatch logs.
    • From the details column, choose View details in CloudWatch.
    • Review the error messages to see what caused the document to fail.
  • My synchronization job succeeded but there are no indexed documents: Possible reasons include the following:
    • Check CloudWatch DocumentsSubmittedForIndexingFailed metric to see if any documents failed to synchronize. Check your CloudWatch logs for details.
    • For an Amazon S3 data source, you might have given Amazon Q Business the wrong bucket name or prefix. Make sure that the S3 bucket that Amazon Q Business is using is the bucket that contains the documents to index.
    • When re-indexing a document that failed to be indexed in an earlier job, Amazon Q Business won't index it unless you've changed the document or its associated metadata file.
  • I am running into file format issues while syncing my data source:
    If you run into file format issues while adding files to your data source or syncing your data source, make sure that your document types are supported by Amazon Q Business.
  • I am getting an AccessDenied When Using SSL Certificate File error message:
    If you are getting an "access denied" error when using an SSL certificate with your data source, check if the IAM role has the necessary permissions to access the certificate file. If the certificate is encrypted with an AWS KMS key, ensure that your IAM role also has permissions to decrypt the certificate using the AWS KMS key.
Advertisements