If your application does not require full-text content search capability, then disabling content indexing comes handy and increases the performance as well.
If you are curious to try out, then follow along.
Pre-requisites
- You have an environment up and running with Alfresco Content Services 7.x and Alfresco Search Services 2.x (Solr6)
- You have administrative privileges
Looking for Alfresco Content Services 7.x with Alfresco Search Services 2.x installation steps? , checkout these posts:
Content and metadata are indexed by default, it is out-of-the-box behavior. There are two ways you can control content/metadata indexing behavior in order to fulfil the search and indexing requirements. We will go over both options.
Control indexing behavior with help of content model aspect:
To control the indexing behavior, you can make use of a content model aspect named "cm:indexControl" which has two properties. These properties indicate whether content/metadata should be indexed.
The value of these properties are set to true by default.
<aspect name="cm:indexControl"> <title>Index Control</title> <properties> <property name="cm:isIndexed"> <title>Is indexed</title> <type>d:boolean</type> <default>true</default> </property> <property name="cm:isContentIndexed"> <title>Is content indexed</title> <type>d:boolean</type> <default>true</default> </property> </properties> </aspect>
You can apply cm:indexControl aspect on the nodes to control the indexing behavior by setting the appropriate properties. Note that, this approach works only for certain types like cm:folder, cm:content and sub-types. You need to keep in mind that, if you have a large number of nodes which needs to be excluded from content/metadata indexing then this option is not a right choice as you will have to apply the aspect by setting "cm:isContentIndexed" to "false" on all those nodes.
In this situation second option (which we will see next) comes handy.
To learn more on content model, aspects and their application, refer: Content Model Extension Point
If you wish to bulk apply the aspect with updated values, this post may be useful as a reference: Applying the aspects in bulk
Control indexing behavior from solr:
This approach comes handy in order to control indexing for all document types across the repository. We can configure the solr indexing behavior by setting these properties (alfresco.index.transformContent and/or alfresco.ignore.datatype.1) in solrcore.properties file.
You need to keep in mind that, if you needed to exclude only specific nodes then this option is not a right choice. This approach controls the indexing behavior globally. It can either enable or disable globally at a time.
If the below given property is set in solrcore.properties file as 'false' , then content indexing will be disabled. The index tracker will not transform any content and only the metadata will be indexed.
If the below given property is set in solrcore.properties file, then metadata indexing will be disabled. The index tracker will not index metadata. This is not ideal setting in most cases as you want to be able to search your documents by their metadata at least. So be mindful when setting this property.
If both of the below given properties are set in solrcore.properties file then both content and metadata indexing will be disabled.
alfresco.index.transformContent=false alfresco.ignore.datatype.1=d:content
To learn more on how to configure these properties, refer the installation guides if your environment is setup using distribution package:
If you wish to set the properties for a docker based environment, then you would have to make use of DockerFile and docker-compose.yml in combination. You can take a look at this repo and this post to understand how DockerFile and docker-compose.yml can be used together to build/update base docker images.
- Create a folder named configs-to-override in same folder where you have kept your docker-compose.yml file
- Create a folder named solr within configs-to-override folder
- Create a file named DockerFile under configs-to-override/solr which will be used for building alfresco-search-services image with additional instructions to disable indexing behavior
- Copy following instructions in the DockerFile (configs-to-override/solr)
FROM alfresco/alfresco-search-services:2.0.3.5#To disable content indexingRUN sed -i '/^bash.*/i sed -i "'"/alfresco.index.transformContent/s/^#//g"'" ${DIST_DIR}/solrhome/templates/rerank/conf/solrcore.properties\n' \ ${DIST_DIR}/solr/bin/search_config_setup.sh;
#To disable metadata indexing #RUN sed -i '/^bash.*/i sed -i "'"/alfresco.ignore.datatype.1/s/^#//g"'" ${DIST_DIR}/solrhome/templates/rerank/conf/solrcore.properties\n' \ #${DIST_DIR}/solr/bin/search_config_setup.sh;#TODO:: Add more steps as needed
- Update the docker-compose.yml for solr6 service as:
solr6: build: dockerfile: ./Dockerfile context: ./configs-to-override/solrmem_limit: 2g environment: # Solr needs to know how to register itself with Alfresco SOLR_ALFRESCO_HOST: "alfresco" SOLR_ALFRESCO_PORT: "8080" # Alfresco needs to know how to call solr SOLR_SOLR_HOST: "solr6" SOLR_SOLR_PORT: "8983" # Create the default alfresco and archive cores SOLR_CREATE_ALFRESCO_DEFAULTS: "alfresco,archive" # HTTPS or SECRET ALFRESCO_SECURE_COMMS: "secret" # SHARED SECRET VALUE JAVA_TOOL_OPTIONS: " -Dalfresco.secureComms.secret=secret " ports: - "8083:8983" # Browser port
- Launch the containers again, use following command. This would build the alfresco-search-services image with updated property:
docker-compose -f ./docker-compose.yml up --build
Note: Disabling/Enabling indexing behaviors requires to re-index the repository.
On the side note, If you want archive or zip files to be unzipped and the files included in the index, set the following property:
transformer.Archive.includeContents=true
The default setting is false.
Validation:
To validate whether indexes are being disabled or not, I have uploaded a text document containing a line of text to make sure that content and metadata are being indexed and document (text file) is returned in search result based on content/metadata query. I have not applied any of the aforementioned methods to disable the indexing yet.
See the details below:
- Run a content search query like: "indexing in Alfresco"
- Run a metadata search query like (name of the file, i.e. cm:name) "test-indexing.txt"
- Run a metadata search query like (title of the file, i.e. cm:title) "IndexControl"
As per the results above, we can see that, search queries for content and metadata are returning results. Now, Let's try disabling the content indexing. We expect to see metadata query returning the results but content query should return 0 results. We will be re-running all the tests that we executed above before disabling the content indexing.
- Following the the steps given above (Option 2), I disabled content indexing
- Deleted the solr indexes for full re-indexing and restarted the servers/containers
- Verified that property value is set to false for disabling the content indexing
- Run a content search query like: "indexing in Alfresco" (0 results expected)
- Run a query via solr admin to see if content is being indexed (no response expected):
- Run a metadata search query like (name of the file, i.e. cm:name) "test-indexing.txt". Result should be returned based on cm:name metadata
- Run a metadata search query like (title of the file, i.e. cm:title) "IndexControl". Result should be returned based on cm:title metadata
Thank you Angel Borroy for tips :)
Thanks for sharing. Your articles are very informative, you cover even minor details as well which is really appreciated
ReplyDeleteThank you for your comment
DeleteThanks for the nice explanation. Was trying to get understanding on control indexing behavior in alfresco. Very helpful
ReplyDeleteThanks Abhinav for informative article. Can we reindex failed documents or a particular document instead of running full reindex????
ReplyDeleteYes you can, checkout these docs:
Deletehttps://docs.alfresco.com/search-services/latest/admin/monitor/#unindexed-transactions
https://docs.alfresco.com/search-services/latest/admin/restapi/