PredictionIO-0.9.6/conf/pio-env.sh PredictionIO binary distribution
created at PredictionIO-0.9.6.tar.gz Extract the binary distribution
you have just built.
$ tar zxvf PredictionIO-0.9.6.tar.gz
2.2 安装依赖
Let us install dependencies inside a subdirectory of the Apache PredictionIO (incubating) installation. By following this convention, you can use Apache PredictionIO (incubating)'s default configuration as is.
$ mkdir PredictionIO-0.9.6/vendors
2.3 安装Spark依赖包
Apache Spark is the default processing engine for PredictionIO. Download and extract it.
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
$ tar zxvfC spark-1.5.1-bin-hadoop2.6.tgz PredictionIO-0.9.6/vendors
If you decide to install Apache Spark to another location, you must edit PredictionIO-0.9.6/conf/pio-env.sh and change the SPARK_HOME variable to point to your own Apache Spark installation.
Once started, the master will print out a spark://HOST:PORT URL for itself, which you can use to connect workers to it, or pass as the "master" argument to SparkContext. You can also find this URL on the master's web UI, which is http://localhost:8080 by default.
Similarly, you can start one or more workers and connect them to the master via:
Once you have started a worker, look at the master's web UI (http://localhost:8080 by default). You should see the new node listed there, along with its number of CPUs and memory (minus one gigabyte left for the OS).
3.3 Create a new Engine from an Engine Template
Now let's create a new engine called MyRecommendation by downloading the Recommendation Engine Template. Go to a directory where you want to put your engine and run the following:
$ pio template get PredictionIO/template-scala-parallel-recommendation MyRecommendation
$ cd MyRecommendation
A new directory MyRecommendation is created, where you can find the downloaded engine template.
3.4 Generate an App> You will need to create a new App in PredictionIO to store all the data of your app. The data collected will be used for machine learning modeling.
Let's assume you want to use this engine in an application named "MyApp1". Run the following to create a new app "MyApp1":
$ pio app new MyApp1
You should find the following in the console output:
... [INFO] [App$] Initialized Event Store for this app>
[INFO] [App$] Access Key:
3.5 Import More Sample Data
This engine requires more data in order to train a useful model. Instead of sending more events one by one in real time, for quickstart demonstration purpose, we are going to use a script to import more events in batch.
A Python import script import_eventserver.py is provided in the template to import the data to Event Server using Python SDK. Please upgrade to the latest Python SDK.
First, you will need to install Python SDK in order to run the sample data import script. To install Python SDK, run:
$ pip install predictionio
You may need sudo access if you have permission issue. (ie. sudo pip install predictionio)
Replace the value of access_key parameter by your Access Key and run:
These commands must be executed in the Engine directory, for example: MyRecomendation.
$ cd MyRecommendation
$ curl https://raw.githubusercontent.com/apache/spark/master/data/mllib/sample_movielens_data.txt --create-dirs -o data/sample_movielens_data.txt
3.6 编译模型
Start with building your MyRecommendation engine. Run the following command:
$ pio build --verbose
This command should take few minutes for the first time; all subsequent builds should be less than a minute. You can also run it without --verbose if you don't want to see all the log messages.
Upon successful build, you should see a console message similar to the following.
[INFO] [Console$] Your engine is ready for training.
To increase the heap space, specify the "-- --driver-memory " parameter in the command. For example, set the driver memory to 8G when deploy the engine:
$ pio deploy -- --driver-memory 8G
When the engine is deployed successfully and running, you should see a console message similar to the following:
[INFO] [HttpListener] Bound to /0.0.0.0:8000 [INFO] [MasterActor] Bind
successful. Ready to serve.
Do not kill the deployed engine process.
By default, the deployed engine binds to http://localhost:8000. You can visit that page in your web browser to check its status.
3.8 Use the Engine
Now, You can try to retrieve predicted results. To recommend 4 movies to user whose>
With the deployed engine running, open another temrinal and run the following curl command or use SDK to send the query: