Tuesday, September 30, 2014

Amazon S3 Cloud Store and Alfresco Integration


What is Alfresco?
  • Alfresco is a friendly team-based collaboration ECM (Enterprise Content Management System).
  • Alfresco comes in three flavors: 
  • Community Edition - It has some important limitations in   terms of scalability and availability, since the clustering feature is not available to this edition. 
  • Enterprise Edition – It supports clustering, its design is geared towards users who require a high degree of modularity and scalable performance. 
  • Cloud Edition – It is a SaaS (Software as a service) version of Alfresco.


What Is Amazon S3?
  • Amazon S3 is cloud storage for the Internet. It is known as Simple Storage Service on Internet.
  • It is designed to make web-scale computing easier for developers.
  • Amazon S3 provides a simple web-services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web.
  • It stores data as Objects and objects are stored with-in folders that are called Buckets. You can also apply permissions on the buckets, so it is private until you share it.
  • Buckets are containers for Objects, you can have one or more buckets. You can also set permissions on buckets such as who can create, delete and list objects in the buckets.
  • The service aims to maximize benefits of scale and to pass those benefits on to developers.
  • Amazon AWS provides regions for all aws applications and one of them is S3.
  • AWS has regions all over the world. Utilizing regions can reduce the latency to the end users since the application/data they need to access can be geographically closer. 

Why Alfresco?

     Its rich feature set is completely accessible over a REST-based interface and can be extended and enhanced with simple server-side JavaScript (i.e. no compilation required). although Java and Groovy etc. are common choices.
     It allows for management of data of any format (not only document content and images).
     It provides rich collaboration tools such as wiki, forums, issue log etc. and functionality to edit and manage image files.
     It enables easy web designing for people, who are not technical users.
     It provides publishing channels such as Google Docs, YouTube, Flickr, Slide Share, Facebook and LinkedIn out of the box.
     Office documents can be edited within CMS using Google Docs as well as offline using its built-in checkout feature.
     Rich Add-ons from Community i.e. plug-ins/tools can be integrated with Alfresco easily.
     Alfresco is compatible with most commonly used operating systems like Linux, MAC and Windows; it can be fully integrated with an organization's office suites like Microsoft Office or OpenOffice.org.
     Supports workflow with help of Activity and JBPM.
     Supports multiple databases.
      Provide search for the uploaded documents using Lucene API

Note: Alfresco use a library called Apache PDBox library (open source java lib) for extracting the texts from PDF and index them.  (http://pdfbox.apache.org/)

Why Amazon S3?
  • It can be used to store and retrieve any amount of data at any time from anywhere on the web.
  • Highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites.
  • You avoid having to buy hardware and paying for the storage that isn't being used.
  • Amazon S3 is very affordable solution for hosting data on the web since you only pay for the bandwidth and storage you use.
  • It is commonly used for backup and storage. For e.g. you want to build a website for your, you can store the static content on S3 securely for faster retrieval and reduces cost. 
  • It also allows versioning of data, so you can record the day to day changes and also roll back to previous version at any time.
  • It is free for 5GB of storage and 20,000 GET requests. Which means, you can store contents up to 5GB free and they can be viewed 20,000 times every month free! 
  • Beyond that, the rates are very nominal and a summary is provided below.
  • For 1 TB storage (approx 1000 GB) the price is $0.095 per GB/month.
  • Let's look at the meaning and analyze. Suppose you have 32 GB of contents. You get 5 GB of storage free. So for 27 GB you have to pay 27 x $0.095 = $2.65 a month.This rate is applicable till you reach 1000 GB.
  • Amazon charges for each request to the content and that is $0.004 for 10,000 requests after the free slot ends.  Considering, your content has 50,000 views a month your expense will be $0.004 x 3 = $0.012.
  • Summing up, to store and stream 32 GB of contents on Amazon S3 your monthly expense will be $2.65 + $0.012 = $2.662 !
  • Reference: http://aws.amazon.com/s3/


Union Benefits:

  • Alfresco can be used as an editorial and content production system, so you can create, curate, edit, and workflow and semantically enrich your content. 
  • At present you need file system storage to store the contents in Alfresco. As the content size increases more and more file system storage is required. This will require upgrading/increasing the file storage space and it is cost prohibitive.
  • While using Amazon S3 as a content store you don’t have to worry about Hardware cost, Maintenance cost. Amazon provides very cost effective storage space in S3. Example of this is discussed above.
  • Using amazon aws together you have all the benefits of true cloud computing which allows you to scale storage or computing power based on actual usage.
Amazon S3 Cloud Store and Alfresco Integration:

Follow the below given steps to integrate S3 with alfresco:

1.   Download the alfresco-s3-integration plug-in from the below given repository.

https://github.com/abhinavmishra14/alfresco-amazon-s3-content-store-integration.git

2.   Import the plug-in into your eclipse.
3.   Open the build.xml and execute the “deploy-war” ant target.
4.   It will automatically apply amp file to war files. 
5.   If you want to create the amp file manually apply amp to alfresco.war file then follow the below given steps.
a.  Copy the amp to tomcat/webapps dir after packaging the amp file.
b.  Copy the alfresco-mmt-command-line-2.1.jar file to tomcat/webapps directory.
c.  Open command prompt/shell
d.  Navigate to the directory tomcat/webapps directory.
e.  Execute the following command, it will install the amp file to alfresco.war and takes the backup of old war file.

C:\Alfresco\tomcat\webapps> java -jar alfresco-mmt-command-line-2.1.jar install C:\Alfresco\tomcat\webapps C:\Alfresco\tomcat\webapps\alfresco.war -force –directory

f.   To ensure that whether module is installed or not, execute following command.

java -jar alfresco-mmt-command-line-2.1.jar list C:\Alfresco\tomcat\webapps\alfresco.war

g.  It will display following info:

Module 'cloudstore' installed in ' C:\Alfresco\tomcat\webapps\alfresco.war '
Example:
-    Title:        Cloud Content Store
-    Version:      2.0
-    Install Date: Mon Jun 16 16:42:12 IST 2014
-    Desription:   Alfresco Cloud Content Store

6.  Before restarting the alfresco follow the below given steps.
  
           - Open the postgresql console.
           - Drop the database [" drop database alfresco " ]
           - Create the database again, run below commands.


                        create database alfresco;
                  grant all on alfresco.* to alfresco@localhost identified by 'alfresco';
                  grant all on alfresco.* to alfresco@localhost.localdomain identified by 'alfresco';
            

           - Delete following directories which contains solr indexes.

              /alf_data/solr/archive/SpacesStore/*
                     /alf_data/solr/workspace/SpacesStore/*
                     /alf_data/solr/archive-SpacesStore/alfrescoModels/*
                    /alf_data/solr/workspace-SpacesStore/alfrescoModels/*
      
7.  Now restart the alfresco and monitor the log files. You will see following log messages:

Loading access keys from 'alfresco-global.properties’.
S3ContentStore Initializing, accessKey: xyzzzz  secretKey: xxxxxx and bucketName: test_bucket
S3ContentStore Initialization Complete

8.   After that go to Amazon S3 Console and refresh. You will see a directory named ‘store:’ in web browser. Keep refreshing you will see all the contents which is present in “C:\Alfresco\alf_data\contentstore

9.   You can also view the console in Eclipse if you have AWS SDK installed.



Monday, September 29, 2014

Configure SSL (Secured Socket Layer) in Apache TOMCAT for web applications:


Secure Sockets Layer (SSL), are cryptographic protocols designed to provide communication security over the Internet. It ensures that all data passed between the web server and browsers remain private and integral.

Follow the below given steps to configure SSL:
  
      1- Go to Java bin directory and generate the keystore fie.

   C:\Program Files\Java\jdk1.7.0_55\bin>keytool -genkey -keystore C:\.keystore -alias tomcat -keyalg RSA


         This will generate a “.keystore” file at c:\ drive.



2-  Go Tomcat's server.xml file and uncomment the port 8443 SSL Connector that came with Tomcat.
     Additionally provide the keystore location in the configuration as given below.

         <Connector
           protocol="HTTP/1.1"
           port="8443" maxThreads="200"
           scheme="https" secure="true" SSLEnabled="true"
           keystoreFile="C:/.keystore" keystorePass="changeit"
           clientAuth="false" sslProtocol="TLS"/>

Note: 
Value of ‘keystorepass’ should be same as the password provided during the generation of ‘.keystore’ file above. In my case i have provided the keystore password as ‘changeit’ while generating the keystore.

3-  Restart the tomcat server.


4-  Open https://localhost:8443/ URL, it will open the tomcat manager page with SSL.
We will receive a warning message. This is basically telling the browser user that the certificate has not been verified by a Certificate Authority. This is because we created a self-signed certificate, which encrypts the communication between browser and server but doesn't guarantee that if certificate is from trusted authority.



If we click to continue to the website, we can see that we indeed are able to hit are web application using SSL.



5-  Let's verify that communication is encrypted or not. Open Eclipse's TCP/IP Monitor view and set up monitors on port 9090 to forward to 8443 and a monitor on port 8080 to forward to 8082.

ü  Click on ‘Add’ button in TCP/IP monitor window in eclipse to setup monitors. Provide the inputs and Click ‘OK’ to add.
                        

ü  Set the types to TCP/IP and start both monitors, and Click ‘OK’.


ü  Select each monitors and click on ‘Start’ to start the monitor.



ü  Go to TCP/IP monitor view (Window>Show View > TCP/IP Monitor)

ü  Open the http://localhost:8080/DemoServlet/ URL and see the monitor. We can view the headers and bodies of the requests and responses.



ü  Now, open the https://localhost:9090/DemoServlet/ URL (which we have configured to route to 8443 port in monitor) and see the monitor. We can view the headers and bodies of the requests and responses. See the highlighted section.



Tuesday, September 23, 2014

How to upload a directory or a file to remote host using apache commons ftp client ?


How to upload a directory or a file to remote host using FTP ?

Prerequisites:----- 

Following jar file (s) are required-
********************************************************************************************************


 apache-commons-net-3.3.jar

********************************************************************************************************

//FTPUtil:


/*
 * Created By: Abhinav Kumar Mishra
 * Copyright &copy; 2014. Abhinav Kumar Mishra.
 * All rights reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

import org.apache.commons.net.ftp.FTP;
import org.apache.commons.net.ftp.FTPClient;

/**
 * The Class FtpUtils.<br/>
 * This class is a utility class, will be used to upload a file or directory to
 * remote host via FTP.
 */
public final class FtpUtils {

/** The Constant EMPTY. */
private static final String EMPTY = "";

/** The Constant BACK_SLASH. */
private static final String FILE_SEPERATOR_LINUX = "/";

/** The Constant FILE_SEPERATOR_WIN. */
private static final String FILE_SEPERATOR_WIN = "\\";

/**
* Upload directory or file.
*
* @param host the host
* @param port the port
* @param userName the user name
* @param password the password
* @param fromLocalDirOrFile the local dir
* @param toRemoteDirOrFile the remote dir
*/
public String uploadDirectoryOrFile(final String host, final int port,
final String userName, final String password,
final String fromLocalDirOrFile, final String toRemoteDirOrFile) {

final FTPClient ftpClient = new FTPClient();
String responseMessage = "";
try {
// Connect and login to get the session
ftpClient.connect(host, port);
ftpClient.login(userName, password);
//Use local passive mode to pass fire-wall
ftpClient.enterLocalPassiveMode();
System.out.println("Successfully connected to remote host!\n");
final File localDirOrFileObj = new File(fromLocalDirOrFile);
if (localDirOrFileObj.isFile()) {
System.out.println("Uploading file: "+ fromLocalDirOrFile);

uploadFile(ftpClient, fromLocalDirOrFile, toRemoteDirOrFile
+ FILE_SEPERATOR_LINUX + localDirOrFileObj.getName());
} else {
uploadDirectory(ftpClient, toRemoteDirOrFile, fromLocalDirOrFile,EMPTY);
}

//Log out and disconnect from the server once FTP operation is completed.
//Log out and disconnect from the server once FTP operation is completed.
if (ftpClient.isConnected()) {
try {
ftpClient.logout();
} catch (IOException ignored) {
System.out.println("Ignoring the exception while logging out from remote host: "+ignored.getMessage());
}
try {
ftpClient.disconnect();
System.out.println("\nSuccessfully disconnected from remote host!\n");
} catch (IOException ignored) {
System.out.println("Ignoring the exception while disconnecting from remote host: "+ignored.getMessage());
}
}
responseMessage = "Upload completed successfully!!";
System.out.println(responseMessage);

} catch (IOException ioexcp) {
responseMessage = ioexcp.getMessage();
ioexcp.printStackTrace();
}

return responseMessage;
}

/**
* Upload directory.
*
* @param ftpClient the ftp client
* @param toRemoteDir the to remote dir
* @param fromLocalParentDir the from local parent dir
* @param remoteParentDir the remote parent dir
* @throws IOException Signals that an I/O exception has occurred.
*/
private void uploadDirectory(final FTPClient ftpClient,
final String toRemoteDir, String fromLocalParentDir,
final String remoteParentDir) throws IOException {

fromLocalParentDir = convertToLinuxFormat(fromLocalParentDir);
fromLocalParentDir = checkLinuxSeperator(fromLocalParentDir);

System.out.println("Listing the directory tree: " + fromLocalParentDir);

final File localDir = new File(fromLocalParentDir);
final File [] listedFiles = localDir.listFiles();
  List<File> subFiles = null;
if(listedFiles!=null){
  subFiles= Collections.unmodifiableList(Arrays.asList(listedFiles));
}
if (subFiles != null && !subFiles.isEmpty()) {
for (final File item : subFiles) {

String remoteFilePath = toRemoteDir + FILE_SEPERATOR_LINUX + remoteParentDir
+ FILE_SEPERATOR_LINUX + item.getName();
if (EMPTY.equals(remoteParentDir)) {
remoteFilePath = toRemoteDir + FILE_SEPERATOR_LINUX + item.getName();
}

if (item.isFile()) {
// Upload the file
final String localFilePath = convertToLinuxFormat(item.getAbsolutePath());
System.out.println("Uploading file: "+ localFilePath);
final boolean isFileUploaded = uploadFile(ftpClient,
localFilePath, remoteFilePath);
if (isFileUploaded) {
System.out.println("File uploaded: '"
+ remoteFilePath+"'");
} else {
System.err.println("Could not upload the file: '"
+ localFilePath+"'");
}
} else {
//Recursively traverse the directory and create the directory.
// Create directory on the server
final boolean isDirCreated = ftpClient.makeDirectory(remoteFilePath);
if (isDirCreated) {
System.out.println("Created the directory: '"
+ remoteFilePath+"' on remote host");
} else {
System.err.println("Could not create the directory: '"
+ remoteFilePath+"' on remote host, directory may be existing!");
}

//Directory created, now upload the sub directory
String parentDirectory = remoteParentDir + FILE_SEPERATOR_LINUX + item.getName();
if (EMPTY.equals(remoteParentDir)) {
parentDirectory = item.getName();
}

fromLocalParentDir = item.getAbsolutePath();
//Call to uploadDirectory to upload the sub-directories
uploadDirectory(ftpClient, toRemoteDir, fromLocalParentDir,
parentDirectory);
}
}
}
}


/**
* Upload file.
*
* @param ftpClient the ftp client
* @param frmLocalFilePath the frm local file path
* @param toRemoteFilePath the to remote file path
* @return true, if successful
* @throws IOException Signals that an I/O exception has occurred.
*/
private boolean uploadFile(final FTPClient ftpClient,
final String frmLocalFilePath, final String toRemoteFilePath)
throws IOException {

final File localFile = new File(frmLocalFilePath);
final InputStream inputStream = new FileInputStream(localFile);
try {
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
return ftpClient.storeFile(toRemoteFilePath, inputStream);
} finally {
inputStream.close();
}
}

/**
* Check the linux seperator.
*
* @param aStr the a str
* @return the string
*/
private String checkLinuxSeperator(String aStr) {
if (!aStr.endsWith(FILE_SEPERATOR_LINUX)) {
aStr = aStr + FILE_SEPERATOR_LINUX;
}
return aStr;
}

/**
* Convert to linux format.
*
* @param inputPath the input path
* @return the string
*/
private String convertToLinuxFormat(final String inputPath) {
return inputPath.replace(FILE_SEPERATOR_WIN, FILE_SEPERATOR_LINUX);
}
}

----------------------------------------------------------------------------------------------------------

//Test client
----------------------------------------------------------------------------------------------------------

/*
 * Created By: Abhinav Kumar Mishra
 * Copyright &copy; 2014. Abhinav Kumar Mishra. 
 * All rights reserved.
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */


import com.ftp.util.FTPUtils;

/**
 * The Class FTPUploadDirectoryClient.
 */
public class FTPUploadDirectoryClient {

public static void main(String[] args) {
final String host = "127.0.0.1";
final int port = 21;
final String userName = "admin";
final String password = "admin";
final String remoteDir = "home/data/build/testUpload"; 
final String localDir = "C:/Abhinav/documents"; //Note: You can also pass a single file name as well.
FtpUtils fileUtils = new FtpUtils();
fileUtils.uploadDirectoryOrFile(host,port,userName,password,localDir,remoteDir);
}

Monday, September 22, 2014

Alfresco and Amazon S3 Publishing


Alfresco and Amazon S3 Publishing:


Follow the below given steps to integrate S3 publishing plug-in with alfresco:

1.   Download the alfresco-s3-integration plug-in from the below given repository.


2.   Import the plug-in into your eclipse.
3.   Open the build.xml and execute the “deploy-all” ant target.
4.   It will automatically apply amp file to alfresco.war & share.war files. It will also print the applied modules info on console after you run the ant target as mentioned above.

5.   If you want to create the amp file manually apply amp to alfresco.war file then follow the below given steps.
a.  Copy the amp to tomcat/webapps dir after packaging the amp file.
b.  Copy the alfresco-mmt-command-line-2.1.jar file to tomcat/webapps directory.
c.  Open command prompt/shell
d.  Navigate to the directory tomcat/webapps directory.
e.  Execute the following command, it will install the amp file to alfresco.war and takes the backup of old war file.

C:\Alfresco\tomcat\webapps> java -jar alfresco-mmt-command-line-2.1.jar install C:\Alfresco\tomcat\webapps C:\Alfresco\tomcat\webapps\alfresco.war -force –directory

f.   To ensure that whether module is installed or not, execute following command.

java -jar alfresco-mmt-command-line-2.1.jar list C:\Alfresco\tomcat\webapps\alfresco.war

g.  It will display following info:

Module 's3-integration' installed in ' C:\Alfresco\tomcat\webapps\alfresco.war '
Example:
-    Title:        Alfresco Amazon S3 Integration
-    Version:      1.0
-    Install Date: Mon Jun 16 16:42:12 IST 2014
-    Desription:   Alfresco Amazon S3 Integration [Developed By-Abhinav Kumar Mishra]

6.  Follow the same steps for share.war file also and deploy the alfresco & share war files.

7.   After that go to Alfresco > Admin Console > Channel Manager .
      In the right hand side Click on 'New' button, you can see a new channel "S3"




Follow the below given steps to configure Amazon S3 publishing channel:

  • Create a bucket on Amazon S3 console.
  • Create a directory where you want to store the authored contents.


       Here, I have created bucket as “Abhinav_Community” and directory as “authored_contents”.

  •         Login to Alfresco and navigate to "Admin Console > Channel Publishing > Channel Manager"

  • Click ‘New’ button and click on Amazon S3 publishing channel. Once you will click you will get a popup to authorize the channel. 
  • Provide the alfresco administrator’s user name and password.


  • You have added the Amazon S3 Publishing channel successfully. See the Amazon S3 icon above.

  • Now configure the channel, with bucket name and directory. No need to modify the channel Id.

  • Now we are done with the configurations.
          
          Publishing the documents to Amazon S3:


  •        Let’s check the Amazon S3 directory (which we created in the beginning) is empty or not.




  • Go  to "Document Library" > "Documents" > "Authoring Contents" > "MyDoc.docx"



  • You can see a “Publish” link. Click on the link to publish the file to S3. 

  • When you will click on the “Publish” link you will get following popup.


  • Select the channel as “AmazonS3” and click “Publish”. You will get following message after publishing.


  • Go to “Publishing History” section at the bottom of the same page. You will see the history as given below.

Now, the MyDoc.docx will be published to Amazon S3.
Actual file URI in Alfresco:  workspace://SpacesStore/59c9f8d8-543d-492d-8bdf-d22c218bd9fd





  • Just in front of the document name you can see and “Unpublish” link. You can use this link to unpublish the document from Amazon S3 if want to. This will delete the document stored on Amazon S3 “Abhinav_Community/authored_contents” directory.







Leave your comments/suggestions below. Otherwise the next time :)