What is MarkLogic?
• MarkLogic
is a NoSQL database.
• MarkLogic
comes in three flavors:
ü Developer
Edition - Free, full-featured version. Included API's extend to all versions of
MarkLogic.
ü Essential
Enterprise – Supports replication, backup, high availability, recovery,
fine-grained security, location services, and alerting. Semantics and advanced
language packs are options.
ü Global
Enterprise – It is designed for use for large, globally distributed
applications. Semantics, tiered storage, geospatial alerting and advanced
language packs are options.
What is Alfresco?
• Alfresco is a friendly team-based collaboration ECM (Enterprise Content Management System).
• Alfresco comes in three flavors:
ü Community Edition - It has some important limitations in terms of scalability and availability, since the clustering feature is not available to this edition.
ü Enterprise Edition – It supports clustering, its design is geared towards users who require a high degree of modularity and scalable performance.
ü Cloud Edition – It is a SaaS (Software as a service) version of Alfresco.
Why Alfresco?
• Its
rich feature set is completely accessible over a REST-based interface and can
be extended and enhanced with simple server-side JavaScript (i.e. no
compilation required). although Java and Groovy etc. are common choices.
• It
allows for management of data of any format (not only document content and
images).
• It
provides rich collaboration tools such as wiki, forums, issue log etc. and
functionality to edit and manage image files.
• It
enables easy web designing for people, who are not technical users.
• It
provides publishing channels such as Google Docs, YouTube, Flickr, Slide Share,
Facebook and LinkedIn out of the box.
• Office
documents can be edited within CMS using Google Docs as well as offline using
its built-in checkout feature.
• Rich
Add-ons from Community i.e. plug-ins/tools can be integrated with Alfresco
easily.
• Alfresco
is compatible with most commonly used operating systems like Linux, MAC and
Windows; it can be fully integrated with an organization's office suites like
Microsoft Office or OpenOffice.org.
• Supports
workflow with help of Activity and JBPM.
• Supports
multiple databases.
• Provide
search for the uploaded documents using Lucene API
Note:
Alfresco use a library called Apache
PDBox library (open source
java lib) for extracting the texts from PDF and index them. (http://pdfbox.apache.org/)
Why MarkLogic?
• High-performing
XML database.
• Use
Google style search engine.
• It's
a document-centric NoSQL solution using XML as the content model, meaning you
get a very scalable repository that can adapt to content changes effortlessly.
• It
unifies structured, semi-structured and unstructured data into a single
database from where organizations can store, retrieve, analyze and manipulate
these very large data sets.
• With
its immediate consistency and a searchable-everything philosophy, there's no
need to compromise on your ACID promise.
Union Benefits of MarkLogic
and Alfresco:
• Alfresco
can be used as an editorial and content
production system, so you can create, curate, edit, and workflow and
semantically enrich your content and finally publish the documents to
MarkLogic.
• Once
published to Mark Logic you have the power to perform fast searches using the powerful standards-based XQuery language.
• Versions
of documents can be controlled in MarkLogic, Alfresco also supports document
versioning but you cannot customize it.
• Using
MarkLogic as your publishing target you gain not only a Big Data content store, but a rich and expressive query language
where Search and Retrieval are combined.
• As
MarkLogic is a document centric content store you don’t need to worry about the
type of content and structure of content.
Supported Alfresco versions : 4.x and 5.x
Supported MarkLogic versions : 5.x and above
Steps to integrate Alfresco with MarkLogic>>>>>
1.
Download the alfresco marklogic
integration plugin from
Alfresco Version 5.x => https://github.com/abhinavmishra14/marklogic-alfresco5.x-integration
Alfresco Version 4.x => https://github.com/abhinavmishra14/marklogic-alfresco-integration
2.
Create the jar file from the plugin maven (for 4.x version)
3. Create amp using ant (5.0.a community version).
4.
Add the jar file in
/alfresco/WEB-INF/lib.
5.
Copy the config directories given in
"/marklogic-integration-alfresco/src/main/config"
of plugin to /tomcat/shared/classes/
6.
Restart the Alfresco.
7.
Download the alfresco marklogic
integration RESTApi from
7.
Configuring MarkLogic>>
I. Create a database named as "alfresco-pub".
II. Create a forest named as "alfresco-pub-forest"
III. Attach the "alfresco-pub-forest" with
"alfresco-pub" database.
IV. Create a role named as "alfresco-publisher" and
assign following ‘Execute
Privileges' to
the alfresco-publisher role under 'Execute Privileges’ section.
A- admin-module-read
B- admin-module-write
C- any-collection
D- xdmp:add-response-header
E- xdmp:eval
F- xdmp:document-get
G- any-uri
H- xdmp:get-session-field
I-
xdmp:http-delete,xdmp:http-get,xdmp:http-head,xdmp:http-post,
xdmp:http- put,xdmp:http-options
J- xdmp:invoke
V. Save the role.
VI. Assign following permissions to 'alfresco-publisher' role
under '
default permissions' section.
·
Select the role 'alfresco-publisher'
and select 'update' from dropdown and save role.
·
Select the role 'alfresco-publisher'
and select 'insert' from dropdown and save role.
·
Select the role 'alfresco-publisher'
and select 'read' from dropdown and save role.
·
Select the role 'alfresco-publisher'
and select 'execute' from dropdown and save role.
VII. Create a user 'alfrescopub-admin', select the password & assign the 'alfresco-publisher'
role to it and save the user.
Note: You can enable/disable marklogic authentication by setting the value of ml.auth.enabled=true/false in marklogic-integration/alfresco-global.properties .Update the marklogic username and password in "marklogic-integration/alfresco-global.properties" in order to use authentication,
VIII. Create an HTTP Server
on Marklogic server,name it as 'Alfresco-Publishing-HTTP'.
IX. Select port number as '9000'
X. Under the root
section provide the path of api which you have downloaded.
For
example:
If the download location is "d:\alfresco-marklogic-publish-unpublish-webservices"
XI. Select the modules as 'file-system'.
XII. Select the database
as 'alfresco-pub' created at (step-7.1).
XIII. Select the authentication 'digest'. If marklogic authentication is enabled in marklogic-integration/alfresco-global.properties. Otherwise Select the authentication 'application-level'.
XIV. Select the default user 'alfrescopub-admin' created at
(step-7.8).
XV. Add the /url-rewriter.xqy inside the url rewriter section.
XVI. Click 'OK' to save the changes on HTTP Server.
Your REST services are ready for use::::::
For publish uri should be : http://127.0.0.1:9000/alfrescopub/publish?uri=someuri
For unpublish uri should be : http://127.0.0.1:9000/alfrescopub/unpublish?uri=someuri
8. Configure Alfresco, Go to "Admin Console > Channel Publishing > Channel Manager"
9. Select "MarkLogic" as channel, Alfresco will authorize it.
10. Channel added to Alfresco
11. Click on "MarkLogic" channel icon to configure channel endpoint.
Provide MarkLogic server hostname e.g. "127.0.0.1" and MarkLogic server port e.g. "9000".
12. Click on "Save", Now channel is ready for use.
Test publishing>>>>>
1. Go to "Project Library" > "Documents" > "Agency File" > "Images"
2. Select an image,
3. Click on "Publish" link on right hand side .
4. Select "MarkLogic" as publishing channel, and click "Publish"
4. Image published to MarkLogic, You can see the status at the bottom of the page.
You can also verify the published content via MarkLogic qconsole or via following service
http://127.0.0.1:9000/alfrescopub/get?id=workspace://SpacesStore/e9528c29-dbbc-49c5-ae63-ae35b67bea33
Where value of id is the uri of content inside Alfresco, it can be seen in the URL of browser while content is viewed
5. You can "Unpublish", by clicking on unpublish link right side in the history section,
and Click "OK". Content will be queued for "Unpublishing"
5. You can see the unpublish status in the "Publishing History" section.
References:
https://docs.alfresco.com/4.2/
https://docs.marklogic.com
Leave your comments/suggestions below. Otherwise the next time :)
Supported Alfresco versions : 4.x and 5.x
Supported MarkLogic versions : 5.x and above
1.
Download the alfresco marklogic
integration plugin from
Alfresco Version 5.x => https://github.com/abhinavmishra14/marklogic-alfresco5.x-integration
Alfresco Version 4.x => https://github.com/abhinavmishra14/marklogic-alfresco-integration
2.
Create the jar file from the plugin maven (for 4.x version)
3. Create amp using ant (5.0.a community version).
3. Create amp using ant (5.0.a community version).
4.
Add the jar file in
/alfresco/WEB-INF/lib.
5.
Copy the config directories given in
"/marklogic-integration-alfresco/src/main/config"
of plugin to /tomcat/shared/classes/
6.
Restart the Alfresco.
7.
Download the alfresco marklogic
integration RESTApi from
7.
Configuring MarkLogic>>
I. Create a database named as "alfresco-pub".
II. Create a forest named as "alfresco-pub-forest"
III. Attach the "alfresco-pub-forest" with
"alfresco-pub" database.
IV. Create a role named as "alfresco-publisher" and
assign following ‘Execute
Privileges' to
the alfresco-publisher role under 'Execute Privileges’ section.
A- admin-module-read
B- admin-module-write
C- any-collection
D- xdmp:add-response-header
E- xdmp:eval
F- xdmp:document-get
G- any-uri
H- xdmp:get-session-field
I-
xdmp:http-delete,xdmp:http-get,xdmp:http-head,xdmp:http-post,
xdmp:http- put,xdmp:http-options
J- xdmp:invoke
V. Save the role.
VI. Assign following permissions to 'alfresco-publisher' role
under '
default permissions' section.
·
Select the role 'alfresco-publisher'
and select 'update' from dropdown and save role.
·
Select the role 'alfresco-publisher'
and select 'insert' from dropdown and save role.
·
Select the role 'alfresco-publisher'
and select 'read' from dropdown and save role.
·
Select the role 'alfresco-publisher'
and select 'execute' from dropdown and save role.
VII. Create a user 'alfrescopub-admin', select the password & assign the 'alfresco-publisher'
role to it and save the user.
Note: You can enable/disable marklogic authentication by setting the value of ml.auth.enabled=true/false in marklogic-integration/alfresco-global.properties .Update the marklogic username and password in "marklogic-integration/alfresco-global.properties" in order to use authentication,
VIII. Create an HTTP Server
on Marklogic server,name it as 'Alfresco-Publishing-HTTP'.
IX. Select port number as '9000'
X. Under the root
section provide the path of api which you have downloaded.
For
example:
If the download location is "d:\alfresco-marklogic-publish-unpublish-webservices"
XI. Select the modules as 'file-system'.
XII. Select the database
as 'alfresco-pub' created at (step-7.1).
XIII. Select the authentication 'digest'. If marklogic authentication is enabled in marklogic-integration/alfresco-global.properties. Otherwise Select the authentication 'application-level'.
XIV. Select the default user 'alfrescopub-admin' created at
(step-7.8).
XV. Add the /url-rewriter.xqy inside the url rewriter section.
XVI. Click 'OK' to save the changes on HTTP Server.
Your REST services are ready for use::::::
For publish uri should be : http://127.0.0.1:9000/alfrescopub/publish?uri=someuri
For unpublish uri should be : http://127.0.0.1:9000/alfrescopub/unpublish?uri=someuri
8. Configure Alfresco, Go to "Admin Console > Channel Publishing > Channel Manager"
9. Select "MarkLogic" as channel, Alfresco will authorize it.
10. Channel added to Alfresco
11. Click on "MarkLogic" channel icon to configure channel endpoint.
12. Click on "Save", Now channel is ready for use.
Leave your comments/suggestions below. Otherwise the next time :)
Provide MarkLogic server hostname e.g. "127.0.0.1" and MarkLogic server port e.g. "9000".
Test publishing>>>>>
1. Go to "Project Library" > "Documents" > "Agency File" > "Images"
2. Select an image,
3. Click on "Publish" link on right hand side .
4. Select "MarkLogic" as publishing channel, and click "Publish"
4. Image published to MarkLogic, You can see the status at the bottom of the page.
You can also verify the published content via MarkLogic qconsole or via following service
http://127.0.0.1:9000/alfrescopub/get?id=workspace://SpacesStore/e9528c29-dbbc-49c5-ae63-ae35b67bea33
Where value of id is the uri of content inside Alfresco, it can be seen in the URL of browser while content is viewed
5. You can "Unpublish", by clicking on unpublish link right side in the history section,
and Click "OK". Content will be queued for "Unpublishing"
5. You can see the unpublish status in the "Publishing History" section.
References:
https://docs.alfresco.com/4.2/
https://docs.marklogic.com
Hi Abhinav,
ReplyDeleteThanks for your nice post on marklogic and alfresco integration.
I want to integrate marklogic 8 developer edition with alfresco 5 community edition. As per tutorial i have checked out the plugin from git, and it is an ant project. You have mentioned create jar using maven. Please mention the target name which needs to build and name of the jar files which need to copy to alfresco server.
Thanks in advance.
This plugin will work with alfresco 4.x and alfresco 5.0.a community. you can use 'deploy-all' target to deploy the amps. Alternatively you can use build-alfresco-amp and build-share-amp targets to create amps and deploy manually.
DeleteI am working in the backend to update the plug-in to make it working with Alfresco 5.x all versions. From 5.0.a on-wards channel manager has been removed.