Monday, July 14, 2014

Marklogic and Alfresco Integration


What is MarkLogic?

       MarkLogic is a NoSQL database.
       MarkLogic comes in three flavors:

ü  Developer Edition - Free, full-featured version. Included API's extend to all versions of MarkLogic.
ü  Essential Enterprise – Supports replication, backup, high availability, recovery, fine-grained security, location services, and alerting. Semantics and advanced language packs are options.
ü Global Enterprise – It is designed for use for large, globally distributed applications. Semantics, tiered storage, geospatial alerting and advanced language packs are options.

What is Alfresco?

      Alfresco is a friendly team-based collaboration ECM (Enterprise Content Management System).
       Alfresco comes in three flavors:

ü  Community Edition - It has some important limitations in   terms of scalability and availability, since the clustering feature is not available to this edition.
ü  Enterprise Edition – It supports clustering, its design is geared towards users who require a high degree of modularity and scalable performance.
ü  Cloud Edition – It is a SaaS (Software as a service) version of Alfresco.

Why Alfresco?

     Its rich feature set is completely accessible over a REST-based interface and can be extended and enhanced with simple server-side JavaScript (i.e. no compilation required). although Java and Groovy etc. are common choices.
     It allows for management of data of any format (not only document content and images).
     It provides rich collaboration tools such as wiki, forums, issue log etc. and functionality to edit and manage image files.
     It enables easy web designing for people, who are not technical users.
     It provides publishing channels such as Google Docs, YouTube, Flickr, Slide Share, Facebook and LinkedIn out of the box.
     Office documents can be edited within CMS using Google Docs as well as offline using its built-in checkout feature.
     Rich Add-ons from Community i.e. plug-ins/tools can be integrated with Alfresco easily.
     Alfresco is compatible with most commonly used operating systems like Linux, MAC and Windows; it can be fully integrated with an organization's office suites like Microsoft Office or OpenOffice.org.
     Supports workflow with help of Activity and JBPM.
     Supports multiple databases.
      Provide search for the uploaded documents using Lucene API

Note: Alfresco use a library called Apache PDBox library (open source java lib) for extracting the texts from PDF and index them.  (http://pdfbox.apache.org/)

Why MarkLogic?

       High-performing XML database.
    Use Google style search engine.
    It's a document-centric NoSQL solution using XML as the content model, meaning you get a very scalable repository that can adapt to content changes effortlessly.
    It unifies structured, semi-structured and unstructured data into a single database from where organizations can store, retrieve, analyze and manipulate these very large data sets.
    With its immediate consistency and a searchable-everything philosophy, there's no need to compromise on your ACID promise.

Union Benefits of MarkLogic and Alfresco:                                  

       Alfresco can be used as an editorial and content production system, so you can create, curate, edit, and workflow and semantically enrich your content and finally publish the documents to MarkLogic.
    Once published to Mark Logic you have the power to perform fast searches using the powerful standards-based XQuery language.
     Versions of documents can be controlled in MarkLogic, Alfresco also supports document versioning but you cannot customize it.
    Using MarkLogic as your publishing target you gain not only a Big Data content store, but a rich and expressive query language where Search and Retrieval are combined.
    As MarkLogic is a document centric content store you don’t need to worry about the type of content and structure of content.

Supported Alfresco versions : 4.x and 5.x
Supported MarkLogic versions : 5.x and above

Steps to integrate Alfresco with MarkLogic>>>>>

1.     Download the alfresco marklogic integration plugin from 
2.     Create the jar file from the plugin maven (for 4.x version)
3.   Create amp using ant (5.0.a community version). 
4.     Add the jar file in /alfresco/WEB-INF/lib.
5.     Copy the config directories given in "/marklogic-integration-alfresco/src/main/config"     
      of plugin to /tomcat/shared/classes/
6.     Restart the Alfresco.
7.     Download the alfresco marklogic integration RESTApi from


7.     Configuring MarkLogic>>

                  I. Create a database named as "alfresco-pub".

                 II. Create a forest named as "alfresco-pub-forest"

                III. Attach the "alfresco-pub-forest" with "alfresco-pub" database.

               IV. Create a role named as "alfresco-publisher" and assign following ‘Execute
 Privileges' to the alfresco-publisher role under 'Execute Privileges’ section.

   A- admin-module-read
   B- admin-module-write
   C- any-collection
   D- xdmp:add-response-header
   E- xdmp:eval
   F- xdmp:document-get
   G- any-uri
   H- xdmp:get-session-field
I-    xdmp:http-delete,xdmp:http-get,xdmp:http-head,xdmp:http-post,
    xdmp:http- put,xdmp:http-options
   
  J- xdmp:invoke

                V. Save the role.

               VI. Assign following permissions to 'alfresco-publisher' role under '
 default permissions' section.

·         Select the role 'alfresco-publisher' and select 'update' from dropdown and save role.
·         Select the role 'alfresco-publisher' and select 'insert' from dropdown and save role.
·         Select the role 'alfresco-publisher' and select 'read' from dropdown and save role.
·         Select the role 'alfresco-publisher' and select 'execute' from dropdown and save role.

          VII. Create a user 'alfrescopub-admin', select the password & assign the 'alfresco-publisher' role to it and save the user.

       Note: You can enable/disable marklogic authentication by setting the value of ml.auth.enabled=true/false in marklogic-integration/alfresco-global.properties .Update the marklogic username and password in "marklogic-integration/alfresco-global.properties" in order to use authentication,

             VIII. Create an HTTP Server on Marklogic server,name it as 'Alfresco-Publishing-HTTP'.

               IX. Select port number as '9000'

                X. Under the root section provide the path of api which you have downloaded.

         For example: 
         If the download location is "d:\alfresco-marklogic-publish-unpublish-webservices"

               XI. Select the modules as 'file-system'.
              XII. Select the database as 'alfresco-pub' created at (step-7.1).
             XIII. Select the authentication 'digest'. If marklogic authentication is enabled in marklogic-integration/alfresco-global.properties. Otherwise Select the authentication 'application-level'.

            XIV. Select the default user 'alfrescopub-admin' created at (step-7.8).
             XV. Add the /url-rewriter.xqy inside the url rewriter section.
            XVI. Click 'OK' to save the changes on HTTP Server.

Your REST services are ready for use::::::
For publish uri should be : http://127.0.0.1:9000/alfrescopub/publish?uri=someuri
For unpublish uri should be : http://127.0.0.1:9000/alfrescopub/unpublish?uri=someuri


 8.  Configure Alfresco, Go to "Admin Console > Channel Publishing > Channel Manager"

          



     9.     Select "MarkLogic" as channel, Alfresco will authorize it.


         10.     Channel added to Alfresco


         11.    Click on "MarkLogic" channel icon to configure channel endpoint.



Provide MarkLogic server hostname e.g. "127.0.0.1" and MarkLogic server port e.g. "9000".

         12.    Click on "Save", Now channel is ready for use.



Test publishing>>>>>

1.  Go to "Project Library" > "Documents" > "Agency File" > "Images"
2.  Select an image,
       



3.  Click on "Publish" link on right hand side .
4.  Select "MarkLogic" as publishing channel, and click "Publish"
      
      







4.  Image published to MarkLogic, You can see the status at the bottom of the page.






You  can also verify the published content via MarkLogic qconsole or via following service



     http://127.0.0.1:9000/alfrescopub/get?id=workspace://SpacesStore/e9528c29-dbbc-49c5-ae63-ae35b67bea33



     Where value of id is the uri of content inside Alfresco, it can be seen in the URL of browser while content is viewed




5.  You can "Unpublish", by clicking on unpublish link right side in the history section,
      and Click "OK". Content will be queued for "Unpublishing"


5.  You can see the unpublish status in the "Publishing History" section.




References: 
https://docs.alfresco.com/4.2/
https://docs.marklogic.com

Leave your comments/suggestions below. Otherwise the next time :)



















    












2 comments:

  1. Hi Abhinav,
    Thanks for your nice post on marklogic and alfresco integration.

    I want to integrate marklogic 8 developer edition with alfresco 5 community edition. As per tutorial i have checked out the plugin from git, and it is an ant project. You have mentioned create jar using maven. Please mention the target name which needs to build and name of the jar files which need to copy to alfresco server.

    Thanks in advance.

    ReplyDelete
    Replies
    1. This plugin will work with alfresco 4.x and alfresco 5.0.a community. you can use 'deploy-all' target to deploy the amps. Alternatively you can use build-alfresco-amp and build-share-amp targets to create amps and deploy manually.

      I am working in the backend to update the plug-in to make it working with Alfresco 5.x all versions. From 5.0.a on-wards channel manager has been removed.

      Delete

Thanks for your comments/Suggestions.