Recsplain System 🦖

The Recsplain System makes recommendations and explains them.

It recommends items based on item similarity or user preferences. It explains the recommendations in terms of overall similarity and feature-to-feature similarity.

Install it in your app, use it with your data, and customize it how you want.

Explainable Recommendations

Here is an example item similarity search. You can see the request and response in the image below.

_images/explanations.png

The request is based on a search item that has three features. It is a US-based product in the meat category and low in price.

import recsplain as rx

item_query_data = {
  "k": 2,
  "data": {
    "price": "low",
    "category": "meat",
    "country": "US"
  },
  "explain": 1
}

rec_strategy.query(**item_query_data)

The response body in the image above contains the recommendations and explanations.

The ids are ordered by index position from most to least recommended. The lowest index position is the most recommended.

{
  "status": "OK",
  "ids": ["1", "2"],
  "distances": [0, 2],
  "explanation": [
    {
      "price": 0,
      "category": 0
    },
    {
      "price": 2,
      "category": 0
    }
  ]
}

Distances explain item similarity based on all features and weights. Explanations provide distances for each feature. The distances and explanations correspond to the ids by index position.

Lower values correspond to greater similarity.

In the example, the system recommends item 1 more than item 2 because item 1 has a lower distance.

Item 1 has a lower distance because it has a lower distance for price than B and they are equal distance in category.

How It Works

For item similarity, Recsplain turns items into weighted feature vectors.

_images/diagram-1.png

The system compares item feature vectors to one another to calculate how similar they are.

_images/diagram-4.png

For user preferences, Recsplain turns a user into an item feature vector based on the user’s previous history with the items.

_images/diagram-2.png

Note

For example, a customer of an online store who bought two cookies and a glass of milk has an item feature vector that is a blend of the item vectors for two cookies and milk.

The system compares the user’s item feature vector to the indexed item feature vectors to calculate how similar the items are those the items in the user’s history. The more similar, the higher the recommendation.

_images/diagram-3.png

Note

To a customer who previously bought two cookies and a glass of milk, the system recommends other items that have similar features to those purchases.

Field Types & Schema

Use the field types and schema to configure the Recsplain filters and encoders.

Filters determine which items are compared to one another. Encoders determine how they are compared.

Here is an example configuration.

import recsplain as rx

config_data = {
  "filters": [{ "field": "country", "values": ["US", "EU"] }],
  "encoders": [
    {
      "field": "price",
      "values": ["low", "mid", "high"],
      "type": "onehot",
      "weight": 1
    },
    {
      "field": "category",
      "values": ["dairy", "meat"],
      "type": "onehot",
      "weight": 2
    }
  ],
  "metric": "l2"
}

rec_strategy = rx.AvgUserStrategy()
rec_strategy.init_schema(**config_data)

Filter Fields

The filter fields are hard filters. They separate items into different partitions. Only items within the same partition are compared to one another.

The example above creates two partitions. One for US items and another for EU.

Encoder Fields

The encoder fields are soft filters for fuzzy matching. They determine how item features are compared within a partition.

The example above selects the one-hot encoder for each of the item features.

Note

Learn more about the one-hot and other available Encoders.

User Encoders

When recommending items for a user, Recsplain has special encoders you should use. Currently, user encoders encode the user’s history into a feature vector.

Note

ArgMaxML created Recsplain. We are focused on creating software the enables you to integrate recommendation engines into your product to increase customer engagement.

Get Started

Follow the guide below to start making explainable recommendations with Recsplain.

Start with:

Then start searching by:

Note

Learn more about the methods in the Reference 🦖.

Installation

Import the package using the following import statement.

import recsplain as rx

Note

Learn more about the method in the Installation reference.

Configuration

First, you need to do is to configure the user recommendation strategy by using:

rec_strategy = rx.AvgUserStrategy()

Note

More strategies to come.

Use the init_schema method to configure the system so that it knows how to partition and compare feature vectors.

Here is an example of how to call the init_schema method.

import recsplain as rx

config_data = {
  "filters": [{ "field": "country", "values": ["US", "EU"] }],
  "encoders": [
    {
      "field": "price",
      "values": ["low", "mid", "high"],
      "type": "onehot",
      "weight": 1
    },
    {
      "field": "category",
      "values": ["dairy", "meat"],
      "type": "onehot",
      "weight": 2
    }
  ],
  "metric": "l2"
}

rec_strategy = rx.AvgUserStrategy()
rec_strategy.init_schema(**config_data)

Weights are used to set the relative importance the system should attribute to this feature in the similarity check.

This is the response from init_schema. The first element is an array of the filters created and the second element is a dictionary of the features and their corresponding vector size.

[('US',), ('EU',)], {'price': 4, 'category': 3}

Note

Encoder type one-hot save one spot for unknown feature_sizes so the size is N + 1.

Note

Learn more about the method in the configuration reference.

Index

Use the index method to add items to the Recsplain system so that it has items to partition and compare.

Here is an example of how to call the index method.

import recsplain as rx

config_data = {
  "filters": [{ "field": "country", "values": ["US", "EU"] }],
  "encoders": [
    {
      "field": "price",
      "values": ["low", "mid", "high"],
      "type": "onehot",
      "weight": 1
    },
    {
      "field": "category",
      "values": ["dairy", "meat"],
      "type": "onehot",
      "weight": 2
    }
  ],
  "metric": "l2"
}

index_data = [
  {
    "id": "1",
    "price": "low",
    "category": "meat",
    "country": "US"
  },
  {
    "id": "2",
    "price": "mid",
    "category": "meat",
    "country": "US"
  },
  {
    "id": "3",
    "price": "low",
    "category": "dairy",
    "country": "US"
  },
  {
    "id": "4",
    "price": "high",
    "category": "meat",
    "country": "EU"
  }
]

rec_strategy = rx.AvgUserStrategy()
rec_strategy.init_schema(config_data)
rec_strategy.index(index_data)

This is the response from index. The first element is a list of errors and the second elemnt is the number of partitions affected by the indexing.

[], 2

Note

If you do not index items, when you search there will be nothing to check the search against for similarity.

Note

When reusing the index method, using the same id twice creates duplicate entries in the index. In the example below the index method is called twice with the same entry. In the index table both entries are created.

_images/reusing_index.png

Note

Learn more about the method in the index reference.

Item Similarity

Use the query method to search by item.

The method returns explainable recommendations for indexed items that are similar to the search item.

Here is an example of how to call the query method.

import recsplain as rx

item_query_data = {
  "k": 2,
  "data": {
    "price": "low",
    "category": "meat",
    "country": "US"
  },
  "explain": 1
}

rec_strategy.query(**item_query_data)

This is the response from query. The first element is the ids of the recommended items, the second element is the distances of each of the recommended items and the third element is the explanation of how much each feature contributed to the overall distance.

('1', '2') (0.0, 2.0) [{'price': 0.0, 'category': 0.0}, {'price': 2.0, 'category': 0.0}]

Note

Learn more about the method in the item similarity reference.

User Preference

Use the user_query method to search by user.

The method returns explainable recommendations for indexed items that the user likely prefers.

Here is an example of how to call the user_query method.

import recsplain as rx

user_query_data = {
  "k": 2,
  "item_history": ["1", "3", "3"],
  "user_data": {
    "country": "US"
  }
}

rec_strategy.user_query(**user_query_data)

This is the response from user_query. The first element is the ids of the recommended items and the second element is the distance of each of these items from the user’s representation (as given by the items history).

['3', '1'] [2.0, 2.0]

Note

Learn more about the method in the user preference reference.

Saving and Loading Models

Save and load models for your Recsplain system.

Save

Use the save_model method to save the model to your computer.

Here is an example of how to call the save_model method.

import recsplain as rx

model_name = "your-model-name"

rx.save_model(model_name)

It returns the saved model in JSON format.

Load

Use the load_model method to load a model to your system.

Here is an example of how to call the load_model method.

import recsplain as rx

model_name = "your-model-name"

rx.load_model(model_name)

Spinning Up A Server

Use the Recsplain system as a web server. It allows you to run searches over the internet.

Send the search item or user in the payload of an HTTP request to your Recsplain server and get recommendations and explanations in response.

You also can use the web server to configure, index, and otherwise use the system.

Installation

Import the package using the following import statement.

import recsplain as rx

Running Server

To run the sever, enter the following command in your terminal.

python -m recsplain

Browse to http://127.0.0.1:5000/docs.

You should see a swagger interface for the REST API.

_images/api-reference.png

Calling Server

Instead of calling the package methods, call the routes to index, configure, search, otherwise interact with the system.

Follow the same steps as in the Get Started document for configuring and indexing before searching by item or user.

Steps are:
  1. init_schema - create the schema

  2. index_item - index items

  3. index_user - index users

After you index and configure, send an item or user to the system and get explainable recommendations in response by using query and user Respectively.

Send data in the body of the HTTP requests and get data in the HTTP response body.

_images/explanations.png

Reference 🦖

The reference supplements the Welcome, Get Started, Saving and Loading Models, Spinning Up A Server, and Reference guides.

Here is the Reference Table of Contents

Installation

Install the Recplain system and then use it as a web server or import it into your code and call the methods directly.

pip install

Install the Recsplain system from PyPI using pip.

pip install recsplain

After you install the package, use it by either:

  • Running a server to call the functions from the rest API or

  • By importing into your code using Python bindings

Run Server

To use the system as a web server, enter the following command in your terminal.

python -m recsplain

Browse to http://127.0.0.1:5000/docs.

You should see a swagger interface for the REST API.

Configure

Read below to learn more about the inputs and outputs.

Inputs

The init_schema method requires the following inputs:

  • filters

  • encoders

  • metric

Filters

Filters control which items the system checks for similarity each time you run an item or user query.

As the example above demonstrates, each filter is comprised of a field and possible values for the field. The two most common hard filters are location and language.

Note

The schema needs to include all possible values for each encoder field.

Note

Each field should correspond to a field for the items in your item database and the values to possible values for those fields.

When you run the init_schema method, the system creates a partition for each filter value.

In each partition are the indexed items whose value for the filter field matches the value for that partition.

When you search, the system checks the search item or user against the items in a particular partition only if the search item’s or user’s value for that feature matches the partition’s value.

Note

In the example above, the system creates two partitions. One for US items and another for EU. When searching, if the search item or user is based in the US, the system only searches the US partition, not the EU partition.

Therefore, filters are hard filters and are used to separate or exclude items for comparison.

Encoders

Encoders control how the system compares items within each partition.

As you see in the example above, each encoder is comprised of a field, possible values for the field, an encoder type, and a weight.

Here is what each does:

  • field: a feature to use in the similarity check

  • values: values to check for the field

  • type: the type of encoder to use for checking similarity for this feature

  • weight: the relative importance the system should attribute to this feature in the similarity check

Unlike the filters, the encoders are not hard filters and therefore do not play a role in creating the partitions.

Instead, the encoders are used when the user searches by item or user to find similar items.

They are soft filters that dictate how the system checks for similarity.

The encoder fields should be a field that the items in your database have or could have.

The values for each field in the encoder should be values that each item could potentially have for that field.

The type of encoder sets how the system calculates similarity.

Note

Check out the list of encoders to learn what encoders you can use and how they work.

The weight tells the system the relative importance of each feature in the encoder.

Note

In the example, category is twice as important as price.

Metric

Metric is the method to use when calculating the returned distance from the similarity server for each item.

types of matrics:
  • l2: the default metric.

  • cosine: the cosine metric.

Outputs

The init_schema method returns an object containing:

  • partitions

  • vector_size

  • feature_sizes

Partitions

The partitions value is the number of partitions the system made based on your configuration.

When you index items, the items are added to the partitions only if the item meets the filter criteria.

Note

A partition is an instance of the similarity server.

As explained above, the number of partitions is based on the number of values init_schema has for filters.

Feature Sizes

Each encoder has a feature size.

The feature size is the number of distinct feature values for each encoder, plus one. The plus one is to account for unknown feature values.

In the example above, the price encoder has three values: ["low", "mid", "high"].

Its feature size, therefore, is 4 because of its three values and the possibility for unknown values.

Similarly, the category feature size is 3 because of its two values and the possibility for an unknown.

Vector Size

The vector size is the sum of the features sizes.

In the example above, the vector size is 7. Here is why. The the price encoder has 3 values and therefore a feature size of 4. The category encoder has 2 values and therefore a feature size of 3. Therefore, the overall feature size is 7.

Total Items

The total items is the total number of items indexed.

Note

Learn more about indexing items from your database.

Index

Read below to learn more about the inputs and outputs.

Inputs

The index method requires that you input data for each item that you want in the system.

The data is an array of item objects.

Each item object should have an id and a field for each item feature.

The id should be a unique value and serves an important role in the similarity check and the results because the system uses the id in the similarity check and the id is how you identify the item in the results for each query.

Thererfore, the id should be a value that makes it easy to identify each item.

Note

For example, it is common to use the SKU number of a product as the value for the id.

Also, notice in the example that the fields other than the id appear as either a filter or encoder field in the init_schema example code.

Note

Check out the init_schema example configuration.

Call the index method as many times as you want. Each time you call it, the data you send is added to the existing data without replacing the existing data.

Outputs

The index method returns the number of affected partitions.

A partition is an affected partition if the system added the item to the partition.

An item is added only to the partitions that it matches. An item matches a partition if the item has the feature value corresponding to the partition filter field.

Note

Using the example from the configuration page, indexing an item sold only in the US would affect one partition, whereas indexing an item sold in both the US and EU would affect two partitions.

Item query

Read below to learn more about the inputs and outputs.

Inputs

The query method takes the following inputs:

  • k

  • data

  • explain

k

The k value is the number of similar items you want the system to return.

data

The data is an object containing fields and values for the features of the item you are searching for.

Note

Like the item features in the filters and indexing stages, the search item data fields should correspond to a field in your item database.

explain

The explain value tells the system if you want explanations about the recommendations.

Send a value of 1 for explain in order to get explanations.

Note

To not include explanations in the results, simply do not include the explain field when you call the function.

Outputs

The query method returns an object containing:

  • ids

  • distances

  • explanations

Note

Explanations are optional. To include them in the response, see above.

ids

The ids are the item recommendations and are ordered by index position from most to least similar to the search input.

The item at index position 0 is the most similar item and the item in the last index position is the least similar.

Note

In the example, A is the top recommendation and has a distance of 1 from the search item. B is the second next best recommendation and has a distance of 3 from the search item.

distances

The distance values tell you how similar each result is to the search item.

Note

The index positions of the distances correspond to the index positions of the ids.

The smaller the distance between two vectors, the more similar the items are to one another.

The distance is an overall similarity value based on comparing the vector for one indexed item to the vector for the search item.

explanations

The explanations tell you more about how the system calculated the distances by providing distance values for each encoder.

Note

The index positions of the explanations correspond to the index positions of the ids.

In the example above, A is overall more similar to the search item than B is to the search item.

The explanations show why.

It is because A has a smaller distance for category than B by 8 and is greater distance for price than B but by only 2.

Plus, the encoder configurations gave category a weight of 2 and price a weight of 1 making category twice as important as price.

Note

Because A beats B on category by 4x more than B beats A on price and because category is greater weight, A has two reasons to be more similar to the search than B has.

User query

Read below to learn more about the inputs and outputs.

Inputs

The user_query method requires the following inputs:

  • k

  • item_history

  • data

  • explain

k

The k value is the number of items you want the system to return as recommendations.

item history

The item history is an array of item ids that the user has previous history with.

The system uses the item history to convert a user to an item vector.

Note

A common example is an array of ids for items the user previously purchased. In the example code, this user previously bought item 1 one time and item 3 twice.

The system uses the features of the items in the array to create an item vector that represents the user based on the features of those items.

The system knows the features of each item in the array because you tell the system the item features when you index the items.

Note

The system compare the user’s item vector to the item vectors for the indexed items. In other words, if a customer bought three bananas, an apple, and a carrot, their user vector represents a combination of the features from three bananas, an apple, and a carrot.

The Recsplain system compares the user’s item vector to the item vector for each indexed item to calculate distance.

data

The data is an object containing fields and values about the user your are searching for. Each user data field should correspond to a field in your indexed items.

Note

User data is most commonly used as hard filters. For instance, in the example in these docs, the system will only recommend US items to the user, not EU items.

explain

The explain value tells the system if you want explanations about the recommendations.

Send a value of 1 for explain in order to get explanations.

Note

To not include explanations in the results, simply do not include the explain field when you call the function.

Outputs

The user_query method returns an object containing:

  • ids

  • distances

  • explanations

Note

Explanations are optional. To include them in the response, see above.

ids

The ids are the item recommendations and are ordered by index position from most to least similar to the user’s item vector.

The item at index position 0 is the item the user most likely prefers and the item in the last index position is the item the user least likely prefers.

Note

In the example, A is the top recommendation and B is the next best recommendation.

distances

The distance values tell you how likely the user is to prefer the item.

The index positions of the ids correspond to the index positions of the distances.

Note

In the example, A is the top recommendation and has a distance of 0.888898987902 from the search item. B is the next best recommendation and has a distance of 3.555675839384 from the search item.

The smaller the distance for an item, the more likely the user is to prefer the item.

The distance is an overall similarity value based on comparing the vector for one indexed item to the user’s item vector.

Note

The results for the user search are the same as for the item search except the user search distances are floats instead of integers.

explanations

The explanations tell you more about how the system calculated the distances by providing distance values for each encoder.

In the example above, the user is more likely to prefer A than B.

The explanations show why.

It is because A has a lower distance for category than B and a lower distance for price than B.

Note

Remember to take the encoder weights into account when reviewing the explanations. The encoder configurations in the example weighted category twice as important as price.

Because A beats B on category and on price, A has two reasons to be more similar to the search than B has.

Encoders

Recsplain comes with a variety of encoders.

Note

The code for the encoders is in encoders.py

Here is the list.

NumericEncoder

Use for numeric data. example:

{
  "field": "fat_precentage",
  "values": np.linspace(0, 100, num=101),
  "type": "numeric"",
  "weight": 1
}

OneHotEncoder

Use for categorical data. First category is saved for “unknown” entries. example:

{
  "field": "category",
  "values": ["dairy", "pasrty", "meat"],
  "type": "onehot",
  "weight": 1
}

StrictOneHotEncoder

Use for categorical data. No “unknown” category. example:

{
  "field": "category",
  "values": ["dairy", "pasrty", "meat"],
  "type": "strictonehot",
  "weight": 1
}

OrdinalEncoder

Use for ordinal data. window is the allowed similarity leakage between closed values. example:

{
  "field": "price",
  "values": ["low", "mid", "high"],
  "type": "ordinal",
  "weight": 1,
  "window": [0.1,1,0.1]

}

BinEncoder

Use for binning data. values is the boundaries of the bins. example:

{
  "field": "product_color",
  "values": ['blue', 'red', 'green'],
  "type": "bin",
  "weight": 1,
}

BinOrdinalEncoder

Use for binning ordinal data.

values is the boundaries of the bins.

window is the allowed similarity leakage between closed values.

example:

{
  "field": "price",
  "values": [10, 50, 100, 500, 1000],
  "type": "binordinal",
  "weight": 1,
  "window": [0.2,1,0.1]
}

HierarchyEncoder

Use for hierarchical data. example:

{
  "field": "sub_category",
  "values": {"meat":["chicken","beef"],"dairy": ['milk','yogurt'],"pastry":['bread','baguette']},
  "type": "hierarchy",
  "weight": 1,
}

NumpyEncoder

User defined encoder as numpy array.

JSONEncoder

User defined encoder as json.

QwakEncoder

Use with qwak data format.

What is Recsplain

The Recsplain system is an explainable recommendation system.

It consists of a tabular similarity search server that calculates item similarity and recommends items with explanations.

Recommendation system

The Recsplain recommendation engine uses machine learning to recommend items based on how similar an item is to another item or how similar an item is to a user’s preferences.

After you configure the system and index your items, you can use the system to:

  • Search by item to find similar items in your database based on item features

  • Recommend items to users based on the user’s history with the items

The applications are virtually endless.

Note

One common application of Recsplain is in online stores where the system recommends products to customers based on items the customer already bought from the store.

Create your own

Use Recsplain to create your very own recommendation engine using your configurations and your data.

It is easy to install and customize to suit your needs. You can configure the system to make recommendations based on your needs by using:

  • Filters to categorically exclude and separate data for comparisons

  • Encoders to dictate how the system checks whether items are similar

After configuring the system, easily index a list of items from your database to start recommending items to your users.

Make recommendations

Use the system as a webserver or by calling the methods from the Recsplain code itself to recommend items by item or by user.

Search by item to get similar items ordered by most to least similar, their distances from the search item based on the item vectors, and explanations about the recommendations.

Search by user to get items the user most likely prefers ordered from most to least likely, each item’s distance from the user based on the user’s vector, and explanations about the recommendations.

Use explanations

The Recsplain system explains its recommendations so that you can better understand the results.

The explanations tell you the degree of similarity for each feature so that you can better understand the order of the recommendations and the distance for each item.

The explanations are a more granular type of result than the overall distance value.

Get started

Read about how it works or just get started!

How it works

Using the Recsplain system is straightforward and does not require you to know any machine learning or advanced math.

Rather, you just need to install the Recsplain Python package into your project, configure the system, and index your items.

After setup, you can search by item or user to recommend items based on the similarity of your indexed items to the search item or user.

Note

You can use Recsplain as a webserver or with Python bindings to call the methods in your code.

When searching by item, the system creates a numerical vector representing the search item based on the search item features.

The system compares the vector for your search item to each item vector for the items you indexed to calculate the distance between the search item vector and each database item vector.

When searching by user, the system creates a numerical vector representing the user as an item vector based on the user’s history with the items

The system compares the item vector for your user to each item vector for the items you indexed to calculate the distance between the user’s item vector and each database item vector.

The system recommends items based on the distances. The smaller the distance between vectors, the more similar they are and the stronger the recommendation.

Read below to learn more about setup and how it works.

Easy Setup

It is just three easy steps to get started.

First, install the package.

Second, configure the system with your search filters, encoders, and metric.

Third, index the items from your database that you want to use in the system for checking similarity.

That is it. You can start making recommendations!

Note

Try it now by following the get started<get-started> guide!

Make recommendations in two different ways.

  • Search by item for similar items based on item features

  • Search by user for items the user is most likely to prefer based on their history with the items

Note

Learn more about Recsplain by exploring more about how the system works.

Custom configuration

The Recsplain system is customizable in several important ways.

Similarity check

First, customize how the system organizes and compares the items in your database. Simply send your configuration data to the system.

The configuration data consists of your filters, encoders, and metric.

Filters are hard filters. The system uses the hard filters to separate the indexed items into partitions. Use the partitions to control which items are checked for similarity each time you run an item or user query.

Note

When searching, the system checks whether the search item or user is similar to the items within a particular partition only if the search item or user fits the filter criteria for that partition.

Encoders are soft filters. Use them to control how the system checks for similarity.

Note

The system uses encoders to determine how to check for similarity within each partition.

Data Index

Second, customize the data by indexing data from your database. This way you can make recommendations based on your actual data.

You can index all your database items or just the data that you want to include as possible recommendations.

Query multiple ways

Recommend items by checking for similarity based on an item or a user.

Note

The system has separate methods for item and user searches.

When you search, you send the system data about the search item or user.

The search item data consists of an id and values for item features that correspond to the filters and encoders.

The search user data consists of a user id and an item history, like a purchase history, where each item in the history is and id for an indexed item.

Note

If you are using Recsplain as a web server, you send the search data in the body of a POST request. If you are using it with Python bindings, call the search method and pass your item data as an argument.

Understand results

Each time you search by item or user, the system returns items it deems similar to the search item or user and explanations for each item in the results.

The system returns the items in an array ordered by most to least similar. The first item in the array is the item that is most similar and the last item in the array is the least similar.

The degree of similarity is measured using the distance between the indexed item vectors and the vector for the search item or user.

When searching by item, similarity consists of comparing the search item vector to the vector for each item in the database.

When searching by user, similarity consists of creating an item vector for the user based on the user’s history with the item and comparing this user vector to the item vector for each indexed item.

For each item in the array, the system also returns an array of distances telling you how similar each item is to the search item or user.

Optionally, the system also returns an array of explanations consisting of more granular result data from which the system derived the final recommendations and overall distances.