The Java and Alfresco World

Wednesday, September 13, 2023

Alfresco repository performance tuning checklist

As an Alfresco developer/admin we often get into situations to optimize the alfresco repository for optimal performance and stability. I often get messages in this regards and, so I decided to put a checklist that you should consider whenever you are dealing with alfresco repository performance.

Part of this is also referred as to how the system was sized in first place. Not every system is sized with pre-defined no. of named users, content size, no. of concurrent users etc. In some cases, we might have to revisit the sizing and tune the repository in order to match with increasing users and content size requirements. This is an incremental process in most cases.

Performance tuning and improvement is not one time job, it is done based on evaluation, trials and updates in an incremental manner.

"Performance tuning is always an open ended process."

Having said that, here are some pointers (to give you ideas to start with) at a very high level that you should consider as a starting point:

Analyze Thread dump, GC Log and Heap dump.

Analyze the thread dump and hot threads, this may help troubleshoot performance problems, CPU usage and deadlocks. You can use support tools for enterprise version and OOTBee support tools for community version. You can also use FastThread to analyze thread dump and get detailed info.

To export thread dump, follow these steps:

Find the java process id (pid). Use this command:

pgrep java

Export the thread dump, use this command with processId (e.g. 1):

jstack [pid] > filepathtosave

jstack 1 > /usr/local/alfresco-community70/tomcat/threaddump.txt

Analyze the GC Logs to get insights on GC events. This may help identify potential problems as well. You can use GCViewer, GCEasy, GCPlot, IBM Garbage Collection and Memory Visualizer, Garbagecat, GC-Log-Analyzer etc. that help analyze GC logs.

To enable GC Logging use following JVM parameter (java 9 or later):

-Xlog:gc*:file=/usr/local/alfresco-community70/tomcat/repo_gc.log

Analyze the Heap Dumps to get insights on Heap usage. You can use HeapHero, Eclipse MAT, IBM HeapAnalyzer, Java VisualVM etc.

Capture heap dumps automatically on OOM Errors by using following JVM parameter:

java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/alfresco-community70/tomcat/repoheapdump.bin

To export heap dump, follow these steps:

Find the java process id (pid). Use this command:

pgrep java

Export the heap dump, use this command with processId (e.g. 1):

jmap -dump:[live],format=b,file=<filepathtosave> <pid>

jmap -dump:[live],format=b,file=/usr/local/alfresco-community70/tomcat/repoheapdump.bin 1

It is sometimes also helpful to review the java system properties, vm flags and vm arguments. You can use below given command see and review:

jinfo <pid>

If you are trying the understand the java memory allocations and usage while app is running then this command will be helpful:

jcmd <pid> GC.heap_info

If you want to know metaspace usage while app is running, then this command will be helpful:

jcmd <pid> VM.metaspace

If you want to know all the VM flags set for your app, then this command will be helpful:

jcmd <pid> VM.flags

You can try jcmd <pid> help to get all other available options pertaining to JVM on your running application.

Learn more on these terminologies here.

Analyze the required no. of users you want to support concurrently. Some of these inputs can be obtained by doing a load test in your target environment:

How much concurrent users is currently being handled by your system?

How much is the targeted no. of concurrent users down the lane?

You can consider creating a client program (using REST or CMIS API) to verify the system and analyze if your system can support the estimated number of users with the expected load before moving to PROD.

How much is supported total no. of DB connections? (this no. relates to the total number of concurrent users as well, so to support concurrent users allowed DB connections must be optimized and configured to an appropriate value). Default max_connections limit is 275.

An example of how to increase thread pool and max connections to support your requirement. In this example need to support max 400 connections, DB instance type in AWS is db.t4g.medium.

RUN sed -i "s/port=\"8080\"\ protocol=\"HTTP\/1.1\"/port=\"8080\"\ protocol=\"HTTP\/1.1\"\ maxHttpHeaderSize=\"32768\"\ maxThreads=\"325\"/g" $TOMCAT_DIR/conf/server.xml ;

OR (server.xml)

<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" URIEncoding="UTF-8"

      redirectPort="8443" maxThreads="325" maxHttpHeaderSize="32768"/>

######### (325 maxThreads +75) in JAVA_OPTS or alfresco-global.properties ###########
-Ddb.pool.max=400

Analyze the sources of content and current load:

What is the current size of repository?

What will be the target size based on estimates? If applicable, collect information about document types to be managed including their size, number of versions, daily volumes, number of properties, distribution ratios, etc. This will yield information for the database and storage requirements that may be required to be upgraded to support the target state.

Consider creating a plan to increase resources with the time. When the repository grows, apart from disk storage, resources like RAM and CPU should also be assessed and increased.

Revisit the JVM settings based on above assessment. You can also refer these docs for an overview:

https://docs.alfresco.com/content-services/latest/config/repository/#tune-the-jvm

Configure your java heap memory such that there is enough room for OS and non-heap processes in your application to avoid system crashes.

Some useful resources around JVM:

java-memory-beyond-heap

Tuning Java Virtual Machines

Re-validate if there is any latency between repository and DB.

Is your DB undergoing regular maintenance and tuning? (Think DB Vacuuming)

Regular maintenance and tuning of the database is necessary. Specifically, all of the database servers that supports require at the very least that some form of index statistics maintenance be performed at frequent, regular intervals to maintain optimal performance. Index maintenance can have a severe impact on performance while in progress, hence it needs to be discussed with your project team and scheduled appropriately.

Round trip time should be less than 1ms (recommended)

Round trip times greater than 1ms indicate unoptimized network that will adversely impact performance. Highly variable round trip time is also of concern, as that may increase the variability of Alfresco Content Services performance.

Revisit the repository cache settings if you have modified it (Check your alfresco-global.properties or JMX Settings to confirm).

Have you modified the default cache limits? (Modifying the limits without understanding it could lead to out of memory issues as it consumes heap memory)

Checkout these posts/docs to deep dive into repo cache:

https://texter.ai/technical-articles/alfresco-repository-caches-unfolded/

https://docs.alfresco.com/content-services/latest/config/repository/#configure-the-repository-cache

https://github.com/Alfresco/alfresco-community-repo/blob/master/repository/src/main/resources/alfresco/caches.properties

Have you enabled clustering (alfresco.cluster.enabled=true)

There is critical connection between repository instances in cluster. There is a cache layer that sits just above the object relational mapping layer that sits above the database tier within repository. This cache is synced across all members of a repository cluster, and that syncing mechanism is sensitive to latency.

If you have a DR environment (active-active setup) and cloud regions are farther from each other and clustering is enabled you will see slower performance as both environments will try to connect each other. So consider this implication and proper remedy before setting up "alfresco.cluster.enabled=true" for DR environment.

If DR environment is not set up as active, it should be ok to have clustering enabled as you will bring DR environment when primary region is down.

Asses how many nodes (files/folders) you are maintaining under one parent node (folder).

As a general rule:

Do not keep more than 1000 nodes (files/folders) in same parent node (folder) if Share UI is your primary interface for users.

Overall do not keep more than 3000 nodes (files/folders) in same parent node in repository.

However it depends on the resources of the server. I’ve seen systems with 10k nodes per folder working fine. A nice addition would be an organization scheme to avoid uploading “all” the files in the same folder, like using year/month/day.

A folder with linked nodes loads slow in Share UI, usually takes longer time than usual to display due to the metadata of all the linked nodes being retrieved as well in the response. If the linked nodes have large custom properties, the JSON response would be huge and can have a considerable impact on the time taken to display the folder in Share. You can set a configuration to lazily load these properties instead by adding the following in JAVA_OPTS.

-Dstrip.linked-node.properties=true

Slower login response? -> We often hear about slow login response and quickly end-up concluding the issue with Alfresco. But this is not always the case which causes the slower login responses.

There may be issue with Alfresco in-terms of network delays between repository and database that may sometimes cause slower login response time. As indicated above that the round trip time should be less than 1ms (recommended).

Consider reviewing the authentication chain configured for your repository. We miss this critical part of the flow where problem lies with slow connectivity/network delays with for example ldap, external authentication systems etc. configured in authentication chain. Learn more on authentication subsystem here.

Asses and analyze your custom code as well. Many of the times problem lies within custom code which we often doesn't consider reviewing.

Do you have excessive logging in your code?

Consider using info logs with info enabled checks

Consider using debug logs with debug enabled checks

Never run your production environment with debug enabled. Set the appropriate log level (e.g.: error) for production. You can use support tools for enterprise version and OOTBee support tools for community version to enable/disable debug logs on need basis.

Are there unclosed Streams/IO/ResultSet/HttpClient/HttpResponse?

Consider using AutoClosable IO and use try-with-resources where applicable.

Must consider closing Result Set (org.alfresco.service.cmr.search.ResultSet) after extracting the search results.

Must consider proper exception handling and closing resources via finally block. Thumb rule for exception handling: "Services should throw and Caller of services should catch. Handled exception must depict WWW (what, where and why)"

Do you have an implementation of behaviors wide open? (Hint: Bind behaviors to specific CLASS/ASSOCIATION/PROPERTIES/SERVICE that you need)

Class - Most commonly used, fired for a particular TYPE/ASPECT
Association - Fired for a particular association or association of a particular type.
Properties - Fired for properties of a particular type or with a particular name.
Service - Fired every time the service generates an event.

Node property/metadata updates via NodeService API:

Ask yourself these questions and analyze your source code:

Adding new properties to the node via NodeService but using nodeService.setProperties instead of using nodeService.addProperties?

Be careful and understand what you are doing, When using nodeService.setProperties before creating versions programmatically. This leads to replacing the property values set from previous versions. You will loose any property values set on workspace node (current version). So make sure you understand what you are doing. Use nodeService.addProperties otherwise.

Updating one property of a node but creating new map of properties and applying it via nodeService.setProperties instead of using nodeService.setProperty?

Updating multiple properties on a node but nodeService.setProperty is being used to set these properties in multiple steps instead of using nodeService.addProperties?

Consider analyzing these methods and your use cases before choosing the methods to use.

Deleting the huge amount of content/folder?

Consider using batch processing and apply "sys:temporary" aspect before deleting them after analyzing your organization's content retention policy.

If you manage some sort of Job IDs (content less nodes) within repository to track the jobs via metadata/properties, make sure you also set the cm:isContentIndexed property on the nodes so that it is an indication that Solr should not try to index the content for these nodes. Specially if you have full text content indexing enabled.

When using SearchService Java foundation API, if you just needs the list of nodeRefs from result and not metadata then make sure you DO NOT setIncludeMetadata to true. This is considered to be a best practice.

When using SearchService Java foundation API, fetch the results via pagination (using ResultSetSPI) instead of fetching all results at once. For example:- Fetch 1000 results in one batch and keep fetching by iterating the result set until result set hasMore results.

Avoid deep pagination:

Some common examples, these type of search parameters should be avoided:

skipCount=0&maxItems=1000000
skipCount=999000&maxItems=1000

Are you using any action that performs transactional operations such as "creating folders" (think an organization scheme such as timestamped folder structure where a file is moved as soon as it gets created) which is triggered multiple times on an event from a folder rule? It can be background operation or foreground operation ? Are you facing challenges to handle concurrency and may be getting "exceptions (e.g. FileExistsException or ConcurrencyFailureException )" ?

If your answer is yes and you jumped to a quick solution to create/update method(s) with "synchronized" keyword? -> We always tempt to use "synchronized" as a quick solution to these problems, but remember that it is considered to be an Anti-pattern in most cases if used without understanding its consequence (think twice before jumping to this solution). It can be slow and lead to contention between threads.

Consider using "RetryingTransactionHelper" and "RetryingTransactionCallback". This semantic however retries only following exceptions:

- org.springframework.dao.ConcurrencyFailureException
- org.springframework.dao.DeadlockLoserDataAccessException
- org.springframework.jdbc.JdbcUpdateAffectedIncorrectNumberOfRowsException
- org.springframework.jdbc.UncategorizedSQLException
- java.sql.SQLException
- java.sql.BatchUpdateException
- org.springframework.dao.DataIntegrityViolationException
- org.alfresco.service.license.LicenseIntegrityException
- org.apache.ibatis.exceptions.TooManyResultsException
- org.alfresco.util.LockHelper.LockTryException

ConcurrencyFailureException will be automatically re-tried.

FileExistsException will not be by default covered via this semantic implementation. This exception will not trigger the transaction to retry unless it is manually caught and rethrown by wrapping in an one of the exceptions (as listed above) which will be retried.

There is a property that can be configured via spring bean to include additional exceptions (also called extraExceptions , an example can be found here). But do not try to include FileExistsException in the list of retry exceptions and it is also considered to be an Anti-pattern (Think if this exception is being thrown from a poorly written code that does not check if a node exist before trying to create it).

I would rather use a try-catch block to catch the FileExistsException and re-throw the exception as a relevant exception from already configured list (shown above) such as the most relevant seems in this case is "DataIntegrityViolationException". This type of exception is handled automatically by retrying transaction considering the implementation is using RetryingTransactionHelper" and "RetryingTransactionCallback". On the next retry, the operation should then find/see the concurrently created node/folder in the existence check (nodeService.exists(nodeRef)) and skip creating it again and goes to next step.

Asses the archive store content requirements. Based on your organization's retention policy, try to clean trashcan or setup trash-can-cleaner scheduled job so you keep appropriate amount of deleted content in archive store. Also cleanup contentstore.deleted folder often.

Checkout the alfresco-trashcan-cleaner-module

Understand Lifecycle of nodes in alfresco: https://www.dbi-services.com/blog/understand-the-lifecycle-of-alfresco-nodes/

Also take a look at this add-on specially the documentation: https://github.com/keensoft/alfresco-deleted-content-store-cleaner#readme

If you are using ACS 5.2 or ACS 6.0/6.1 which uses legacy transformation services, then configure async LibreOffice subsystem. This part is not applicable to ACS 6.2 and later.

Checkout the documentation here:

https://docs.alfresco.com/content-services/5.2/config/libreoffice/

https://docs.alfresco.com/content-services/6.0/config/libreoffice/

Asses the services/subsystems/features being used and un-used. Some examples which you can considering to disable if not in use:

cifs.enabled=false

audit.enabled=false
audit.alfresco-access.enabled=false
audit.tagging.enabled=false
imap.server.enabled=false
imap.server.attachments.extraction.enabled=false
googledocs.enabled=false
system.webdav.servlet.enabled=false
ftp.enabled=false
system.usages.enabled=false

activities.feed.notifier.enabled=false
activities.feed.notifier.cronExpression=* * * * * ? 2099  
activities.feed.cleaner.enabled=false
activities.feed.cleaner.cronExpression=* * * * * ? 2099  
activities.feed.generator.enabled=false
activities.feed.generator.cronExpression=* * * * * ? 2099 
activities.post.cleaner.cronExpression=* * * * * ? 2099  
activities.post.cleaner.enabled=false
activities.post.lookup.cronExpression=* * * * * ? 2099   
activities.post.lookup.enabled=false

If you are not using out of process extensions, you can also disable event2. Enable it when you plan to use it. Check more details here:

repo.event2.enabled=false

If you have enabled replication but not really using it, then disable it. It is disabled by default unless you enable it.

replication.enabled=false

transferservice.receiver.enabled=false

Additional Ideas (some of them do not apply in 7.x):

https://www.slideshare.net/LuisCabaceira/alfresco-tuning-part1

https://www.slideshare.net/LuisCabaceira/alfresco-tuning-part1-54221871

Control indexing in Alfresco with Search Enterprise

Following my previous post on controlling indexing behavior, I tested the indexing behavior with Alfresco Content Services 7.2.1.3 enterprise and Search Enterprise 3.1.1.

Pre-requisites

You have an environment up and running with Alfresco Content Services 7.x and Search Enterprise 3.x)
You have administrative privileges

Looking for Alfresco Content Services 7.x with Search Enterprise 3.x installation steps? , checkout this post:

Setup ACS-7.x with Elasticsearch and Transformation Service Step by Step

Content and metadata are indexed by default, it is out-of-the-box behavior. Indexation with Elastic Search connector happens using events. There are two ways you can control content/metadata indexing behavior in order to fulfil the search and indexing requirements. We will go over both options.

Control indexing behavior with help of content model aspect:

To control the indexing behavior, you can make use of a content model aspect named "cm:indexControl" which has two properties. These properties indicate whether content/metadata should be indexed.

The value of these properties are set to true by default.

<aspect name="cm:indexControl">
	<title>Index Control</title>
	<properties>
		<property name="cm:isIndexed">
			<title>Is indexed</title>
			<type>d:boolean</type>
			<default>true</default>
		</property>
		<property name="cm:isContentIndexed">
			<title>Is content indexed</title>
			<type>d:boolean</type>
			<default>true</default>
		</property>
	</properties>
</aspect>

You can apply cm:indexControl aspect on the nodes to control the indexing behavior by setting the appropriate properties. Note that, this approach works only for certain types like cm:folder, cm:content and sub-types. You need to keep in mind that, if you have a large number of nodes which needs to be excluded from indexing then this option is not a right choice as you will have to apply the aspect by setting "cm:isContentIndexed" to "false" on all those nodes. This approach is exactly same even if you are using Alfresco Search Service (Solr6), no difference.

In this situation second option (which we will see next) comes handy.

A known issue with UPDATE event:

If you use folder rule to apply the cm:indexControl aspect
Or If you use any script to update the node to apply the cm:indexControl aspect with cm:isContentIndexed to false

As we know that, indexation with elasticsearch happens with events, so when you create/upload a content and try to set the cm:indexControl aspect using a folder rule, the CREATE and UPDATE events occur one after other.

At the moment Live Indexing app is missing to remove the document from the index if that UPDATE contains the setting for the cm:indexControl aspect. It seems to be just omitting the indexation of the changes.

This is a known issue and being tracked using this ticket: https://alfresco.atlassian.net/browse/MNT-23347

[EDIT] : As of 01/29/2024 the above known issue is not yet fixed. It seems to be in 2024-Q2 Roadmap. We will have to wait for it.

There are following alternatives to apply the cm:indexControl aspect at CREATE event:

By creating a behavior using "org.alfresco.repo.node.NodeServicePolicies" policy and implement "onCreateNode" method and set the cm:indexControl aspect. Learn more here on implementing Behavior Policies.

Create a custom aspect by overriding the default values as stated above

Create a custom aspect overriding cm:indexControl.
For example, if you want to disable content indexing, set the cm:isContentIndexed to false. To learn more on content model, aspects and their application, refer: Content Model Extension Point

<aspect name="demo:customIndexControl">
	<title>Override Index Control to disable content indexing by default</title>

        <parent>cm:indexControl</parent> 
	<overrides>
		<property name="cm:isIndexed">

                        <default>true</default>
		</property>
		<property name="cm:isContentIndexed">
			<default>false</default>
		</property>
	</overrides>
</aspect>

Thank you Angel Borroy for help on this.

Control indexing behavior via LiveIndexingApp mediation-filter :

Live Indexing App is a component within Elastic Search connector which is responsible for indexing nodes (content/metadata). There is a component called Mediation (alfresco-elasticsearch-live-indexing-mediation) which subscribes to the alfresco.event.topic (activemq:topic:alfresco.repo.event2) and processes the incoming node events. The configuration of this component allows you to declare four blacklist sets for filtering out nodes or attributes to be indexed. These blacklists can be specified in the file using the alfresco.mediation.filter-file property. The default file is called mediation-filter.yml that must be in the module classpath.

You need to keep in mind that, if you needed to exclude only specific nodes then this option is not a right choice. This approach controls the indexing behavior globally. It can either enable or disable globally at a time.

mediation-filter.yml out of the box (showing blacklisted aspects and fields):

mediation:
  nodeTypes:
  contentNodeTypes:
  nodeAspects:
    - sys:hidden
  fields:
    - cmis:changeToken
    - alfcmis:nodeRef
    - cmis:isImmutable
    - cmis:isLatestVersion
    - cmis:isMajorVersion
    - cmis:isLatestMajorVersion
    - cmis:isVersionSeriesCheckedOut
    - cmis:versionSeriesCheckedOutBy
    - cmis:versionSeriesCheckedOutId
    - cmis:checkinComment
    - cmis:contentStreamId
    - cmis:isPrivateWorkingCopy
    - cmis:allowedChildObjectTypeIds
    - cmis:sourceId
    - cmis:targetId
    - cmis:policyText
    - trx:password
    - pub:publishingEventPayload

Where:

nodeTypes: if the node wrapped in the incoming event has a type which is included in this set, the node processing is skipped.

contentNodeTypes: if the node wrapped in the incoming event has a content change associated with it and it has a type which is included in this set, then the corresponding content processing won’t be executed. This means nodes belonging to one of the node types in this set, won’t have any content indexed in Elasticsearch.

nodeAspects: if the node wrapped in the incoming event has an aspect which is included in this set, the node processing is skipped.

fields: fields listed in this set are removed from the incoming nodes metadata. This means fields in this set won’t be sent to Elasticsearch for indexing, and therefore they won’t be searchable.

For more details on setting up Elastic Connector and its components visit here.

Disable/Blacklist content indexing:

To disable content indexing we need to blacklist the "cm:content" type under "contentNodeTypes". With this set, node content won't be transformed and content will not be added to Elasticsearch Index. You won't be able to search the document by content.

The updated 'mediation-filter.yml' file looks as follows:

mediation:
  nodeTypes:
  contentNodeTypes:
    - cm:content
  fields:
    - cmis:changeToken
    - alfcmis:nodeRef
    - cmis:isImmutable
    - cmis:isLatestVersion
    - cmis:isMajorVersion
    - cmis:isLatestMajorVersion
    - cmis:isVersionSeriesCheckedOut
    - cmis:versionSeriesCheckedOutBy
    - cmis:versionSeriesCheckedOutId
    - cmis:checkinComment
    - cmis:contentStreamId
    - cmis:isPrivateWorkingCopy
    - cmis:allowedChildObjectTypeIds
    - cmis:sourceId
    - cmis:targetId
    - cmis:policyText
    - trx:password
    - pub:publishingEventPayload

Blacklist specific properties from being indexed:

To disable/blacklist one or more specific metadata from being indexed, we need to blacklist the properties e.g. "demo:documentIDInternal" (a content model property) under "fields". These properties won’t be sent to Elasticsearch for indexing, and therefore they won’t be searchable.

The updated 'mediation-filter.yml' file looks as follows:

mediation:
  nodeTypes:
  contentNodeTypes:
  nodeAspects:
    - sys:hidden
  fields:
    - cmis:changeToken
    - alfcmis:nodeRef
    - cmis:isImmutable
    - cmis:isLatestVersion
    - cmis:isMajorVersion
    - cmis:isLatestMajorVersion
    - cmis:isVersionSeriesCheckedOut
    - cmis:versionSeriesCheckedOutBy
    - cmis:versionSeriesCheckedOutId
    - cmis:checkinComment
    - cmis:contentStreamId
    - cmis:isPrivateWorkingCopy
    - cmis:allowedChildObjectTypeIds
    - cmis:sourceId
    - cmis:targetId
    - cmis:policyText
    - trx:password
    - pub:publishingEventPayload

    - demo:documentIDInternal

    - demo:publisherInternal

Blacklist specific types from being indexed:

To disable/blacklist one or more types from being indexed, we need to blacklist the types e.g. "demo:invoice" (a content model type) under "nodeTypes". The nodes having type as "demo:invoice" would be excluded from indexing and can't be searched.

The updated 'mediation-filter.yml' file looks as follows:

mediation:
  nodeTypes:
    - demo:invoice
  contentNodeTypes:
  nodeAspects:
    - sys:hidden
  fields:
    - cmis:changeToken
    - alfcmis:nodeRef
    - cmis:isImmutable
    - cmis:isLatestVersion
    - cmis:isMajorVersion
    - cmis:isLatestMajorVersion
    - cmis:isVersionSeriesCheckedOut
    - cmis:versionSeriesCheckedOutBy
    - cmis:versionSeriesCheckedOutId
    - cmis:checkinComment
    - cmis:contentStreamId
    - cmis:isPrivateWorkingCopy
    - cmis:allowedChildObjectTypeIds
    - cmis:sourceId
    - cmis:targetId
    - cmis:policyText
    - trx:password
    - pub:publishingEventPayload

Following property needs to be set to control indexing behavior via mediation-filter:

alfresco.mediation.filter-file or ALFRESCO_MEDIATION_FILTER-FILE => The configuration file which contains fields and node types blacklists. The default value is classpath:mediation-filter.yml

Note: Default mediation-filter config resides in 'alfresco-elasticsearch-live-indexing-shared-xxx.jar' which is a dependency of alfresco-elasticsearch-live-indexing-xxx-app.

Mediation filter can be provided via either of the following:

When using docker based environment, you can create a bind mount and provide your custom mediation-filter.yml

If you are using separate services for each component

live-indexing-mediation:
        image: quay.io/alfresco/alfresco-elasticsearch-live-indexing-mediation:3.1.1
        depends_on:
            - elasticsearch
            - alfresco
        environment:
            SPRING_ELASTICSEARCH_REST_URIS: http://elasticsearch:9200
            SPRING_ACTIVEMQ_BROKERURL: nio://activemq:61616
            ALFRESCO_MEDIATION_FILTER-FILE: file:/usr/tmp/mediation-filter.yml
        volumes:
            - ./mediation-filter.yml:/usr/tmp/mediation-filter.yml

If you are using AIO live indexing service (alfresco-elasticsearch-live-indexing)

live-indexing:
        image: quay.io/alfresco/alfresco-elasticsearch-live-indexing:3.1.1
        depends_on:
            - elasticsearch
            - alfresco
        environment:
            SPRING_ELASTICSEARCH_REST_URIS: http://elasticsearch:9200
            SPRING_ACTIVEMQ_BROKERURL: nio://activemq:61616
            ALFRESCO_MEDIATION_FILTER-FILE: file:/usr/tmp/mediation-filter.yml

            ALFRESCO_ACCEPTED_CONTENT_MEDIA_TYPES_CACHE_BASE_URL: http://transform-core-aio:8090/transform/config
            ALFRESCO_SHAREDFILESTORE_BASEURL: http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file/
        volumes:
            - ./mediation-filter.yml:/usr/tmp/mediation-filter.yml

Bind mount syntax -> [SOURCE:]TARGET[:MODE]

- SOURCE can be a named volume or a (relative or absolute) path on the host system.
- TARGET is an absolute path in the container where the volume is mounted.
- MODE is a mount option which can be read-only (ro) or read-write (rw) (default).

For more details read docker volumes documentation here.

Launch the containers again, use following command. This would launch the containers with updated changes:

docker-compose -f ./docker-compose.yml up

If you have installation based on distribution package then pass the following param to live indexing boot app and start it:

java -jar C:\alfresco-elastic-search-services\alfresco-elasticsearch-live-indexing-3.1.1-app.jar ^
	--alfresco.mediation.filter-file=file:C:\\alfresco-elastic-search-services\\mediation-filter.yml

java -jar C:\alfresco-elastic-search-services\alfresco-elasticsearch-live-indexing-3.1.1-app.jar ^
	-DALFRESCO_MEDIATION_FILTER-FILE=file:C:\\alfresco-elastic-search-services\\mediation-filter.yml

Note: Any newly created/uploaded content will be taken care by the live indexing app. For existing content, a re-indexing will be required.

More on Elastic Search connector can be found here.

References:

https://docs.alfresco.com/content-services/latest

https://docs.alfresco.com/search-enterprise/latest

Monday, October 10, 2022

Control indexing in Alfresco with Alfresco Search Services

I came acorss a query from a friend to disable content indexing and allow only matadata indexing recently. The last time i tried this when i was using Alfresco 5.0 with Solr4. This time i tried with Alfresco Content Services 7.2.0.1 and Alfresco Search Services 2.0.3.5 and the good news is that, it still works.

If your application does not require full-text content search capability, then disabling content indexing comes handy and increases the performance as well.

If you are curious to try out, then follow along.

Pre-requisites

You have an environment up and running with Alfresco Content Services 7.x and Alfresco Search Services 2.x (Solr6)
You have administrative privileges

Looking for Alfresco Content Services 7.x with Alfresco Search Services 2.x installation steps? , checkout these posts:

Setup ACS-7.x, ASS-2.x and Local Transformation Service using distribution package step by step Part-1

Setup ACS-7.x, ASS-2.x and Local Transformation Service using distribution package step by step Part-2

Content and metadata are indexed by default, it is out-of-the-box behavior. There are two ways you can control content/metadata indexing behavior in order to fulfil the search and indexing requirements. We will go over both options.

Control indexing behavior with help of content model aspect:

The value of these properties are set to true by default.

<aspect name="cm:indexControl">
	<title>Index Control</title>
	<properties>
		<property name="cm:isIndexed">
			<title>Is indexed</title>
			<type>d:boolean</type>
			<default>true</default>
		</property>
		<property name="cm:isContentIndexed">
			<title>Is content indexed</title>
			<type>d:boolean</type>
			<default>true</default>
		</property>
	</properties>
</aspect>

You can apply cm:indexControl aspect on the nodes to control the indexing behavior by setting the appropriate properties. Note that, this approach works only for certain types like cm:folder, cm:content and sub-types. You need to keep in mind that, if you have a large number of nodes which needs to be excluded from content/metadata indexing then this option is not a right choice as you will have to apply the aspect by setting "cm:isContentIndexed" to "false" on all those nodes.

In this situation second option (which we will see next) comes handy.

To learn more on content model, aspects and their application, refer: Content Model Extension Point

If you wish to bulk apply the aspect with updated values, this post may be useful as a reference: Applying the aspects in bulk

Setup ACS-7.2.x with Elasticsearch and Transformation Service Step by Step

As part of the step by step series, I outlined the steps to install Alfresco Content Services 7.x and Solr6 (Alfresco Search Services 2.x) using distribution package in my previous posts.

You can take a look at the steps here:

Setup ACS-7.x, ASS-2.x and Local Transformation Service using distribution package step by step Part-1

Setup ACS-7.x, ASS-2.x and Local Transformation Service using distribution package step by step Part-2

I am here with another part of step by step series on ACS installation. We will try to setup Alfresco Content Services-7.2.x Enterprise with Elasticsearch (Search Enterprise-3.x) and all other components. This post will set the background for those who are looking to upgrade from Solr6 (Alfresco search service 2.x) to Elasticsearch (Search Enterprise-3.x) and evaluating.

Let me know in the comments as to why should you upgrade to Elasticsearch from Solr 6.

*** Updated this post to install ACS 7.2.1.3 that contains the patch for a memory leak issue. You can find more details here ***

What we need before we start doing setup?

ACS-7.2.x package (alfresco-content-services-distribution-7.2.1.3)
ASE-3.x package (alfresco-elasticsearch-connector-distribution-3.1.1)
Alfresco Transform 1.5.x package (alfresco-transform-service-distribution-1.5.3)
Java: Oracle jdk-11.0.13 or later/Open JDK 11.0.13 or later (I will use Oracle jdk-11.0.15 for this post)
Elasticsearch 7.10
Tomcat: Tomcat 9.0.62
ActiveMQ: ActiveMQ v5.16.2.
DB: PostgreSQL 13
ImageMagick: ImageMagick v7.1.0
Libreoffice: LibreOffice v7.0.6

Checkout the documentation for additional details on ACS Supported Platforms and Search Enterprise Supported Platforms

Platform:

Windows 10 x64

Type of deployment:

ACS, Share, Elasticsearch (Search Enterprise-3.x), and Transformation Service on same machine

Let’s download all the required packages that we need for the setup.

Download Enterprise Distribution Packages:

Download Alfresco Content Services Enterprise 7.2.x package from Support Portal. If you have an enterprise license, you can also download the package from alfresco artifacts repository.

https://artifacts.alfresco.com/nexus/service/local/repositories/enterprise-releases/content/org/alfresco/alfresco-content-services-distribution/7.2.1.3/alfresco-content-services-distribution-7.2.1.3.zip

Download Alfresco Transform Service 1.5 package from Support Portal. If you have an enterprise license, you can also download the package from alfresco artifacts repository.

https://artifacts.alfresco.com/nexus/service/local/repositories/enterprise-releases/content/org/alfresco/alfresco-transform-service-distribution/1.5.3/alfresco-transform-service-distribution-1.5.3.zip

Checkout this GitHub repo for more info on transform core all-in-one project

Download Alfresco Search Enterprise 3.1.0 package from Support Portal. If you have an enterprise license, you can also download the package from alfresco artifacts repository.

https://artifacts.alfresco.com/nexus/service/local/repositories/enterprise-releases/content/org/alfresco/alfresco-elasticsearch-connector-distribution/3.1.1/alfresco-elasticsearch-connector-distribution-3.1.1.zip

Download Alfresco model namespace map generator add-on, it can be used for setting up elastic search indexing app.

https://github.com/AlfrescoLabs/model-ns-prefix-mapping/releases/download/1.0.0/model-ns-prefix-mapping-1.0.0.jar

Download and Install Oracle JDK 11.0.15:

https://www.oracle.com/java/technologies/javase/jdk11-archive-downloads.html

https://download.oracle.com/otn/java/jdk/11.0.15+8/c4e1848573124815b77d6f1843afccb5/jdk-11.0.15_windows-x64_bin.exe

Note: Make sure you set the JAVA_HOME environment variable (on windows). It is the installation path of jdk. E.g. JAVA_HOME=C:\Program Files\Java\jdk-11.0.15

Download Elasticsearch Package:

https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.1-windows-x86_64.zip

Alternative options:
https://www.elastic.co/guide/en/elasticsearch/reference/current/zip-windows.html
https://www.elastic.co/downloads/past-releases/elasticsearch-7-10-1

Download Tomcat 9.0.62 binary package:

https://archive.apache.org/dist/tomcat/tomcat-9/v9.0.62/bin/apache-tomcat-9.0.62-windows-x64.zip

Note: Make sure ports 8005, 8080, 8443, AJP port 8009 are open and not in use already. These are default ports used for tomcat. If you have these ports already in use, make sure you change the ports accordingly in <TOMCAT_INSTALLATION>/conf/server.xml.

Download ActiveMQ v5.16.2 binary package (used for transformation service):

https://archive.apache.org/dist/activemq/5.16.2/apache-activemq-5.16.2-bin.zip

Download Alfresco PDF Renderer (v1.1) binary package (used for transformation service):

https://nexus.alfresco.com/nexus/service/local/repositories/releases/content/org/alfresco/alfresco-pdf-renderer/1.1/alfresco-pdf-renderer-1.1-win64.tgz

Download ImageMagick v7.1.0:

https://imagemagick.org/archive/binaries/ImageMagick-7.1.0-39-Q16-HDRI-x64-dll.exe

Alternative options (ImageMagick notoriously removes the specific versions and shows only latest versions):

https://imagemagick.org/script/download.php#windows

https://download.imagemagick.org/ImageMagick/download/binaries/

Download LibreOffice v7.0.6:

https://downloadarchive.documentfoundation.org/libreoffice/old/7.0.6.2/win/x86_64/LibreOffice_7.0.6.2_Win_x64.msi (installer)

All other alternatives: https://downloadarchive.documentfoundation.org/libreoffice/old/

Download Exiftool v12.25:

https://artifacts.alfresco.com/nexus/content/groups/public/org/exiftool/image-exiftool/12.25/image-exiftool-12.25.tgz

Download and Install PostgreSQL 13.x:

https://get.enterprisedb.com/postgresql/postgresql-13.7-1-windows-x64.exe

Alternatively you can also download the binary package and extract it. No installation needed. It is useful if you have trouble doing installation on Windows 10.

Download PostgreSQL 13.x binary

Note: Make sure port 5432 is open and not already in use. Port 5432 is default for postgres to get db connection. If you have this port already in use, make sure you select a different port and use the same while configuring alfresco-global.properties.

Optional Alfresco module packages (amps)-Useful for admins/developers:

https://github.com/abhinavmishra14/js-console/releases/download/0.7.3/javascript-console-platform-0.7.3.amp

https://github.com/abhinavmishra14/js-console/releases/download/0.7.3/javascript-console-share-0.7.3.amp

The Java and Alfresco World

Wednesday, September 13, 2023

Alfresco repository performance tuning checklist

Additional Ideas (some of them do not apply in 7.x):

Friday, April 28, 2023

Control indexing in Alfresco with Search Enterprise

Control indexing behavior with help of content model aspect:

Control indexing behavior via LiveIndexingApp mediation-filter :

Mediation filter can be provided via either of the following:

Monday, October 10, 2022

Control indexing in Alfresco with Alfresco Search Services

Control indexing behavior with help of content model aspect:

Saturday, June 4, 2022

Setup ACS-7.2.x with Elasticsearch and Transformation Service Step by Step

Popular Posts

Search This Blog

Featured Post

Setup ACS-7.x, ASS-2.x and Local Transformation Service using distribution package step by step Part-1

Wednesday, September 13, 2023

Alfresco repository performance tuning checklist

Additional Ideas (some of them do not apply in 7.x):

Friday, April 28, 2023

Control indexing in Alfresco with Search Enterprise

Control indexing behavior with help of content model aspect:

Control indexing behavior via LiveIndexingApp mediation-filter :

Mediation filter can be provided via either of the following:

Monday, October 10, 2022

Control indexing in Alfresco with Alfresco Search Services

Control indexing behavior with help of content model aspect:

Saturday, June 4, 2022

Setup ACS-7.2.x with Elasticsearch and Transformation Service Step by Step

Subscribe To

Popular Posts

Search This Blog

Featured Post

Setup ACS-7.x, ASS-2.x and Local Transformation Service using distribution package step by step Part-1