Welcome to MyNQL’s documentation!

MyNQL

MyNQL is a minimalistic graph database based on the Python library Networkx. Instead of replacing your relational database, it helps you to add a network with references to the data you already have.

  • Nodes have the format table.id
  • Connections (only) have a distance

You may already have tables like: Customers, merchants, products, places, areas, promotions, interests. Those tables used to have an id that together with the table name identify each entry.

After teaching the MyNQL network relations between two table1.id1 <-> table2.id2, you can ask the network also about all the indirect relations you like to know. A simple connect and select is all you need.

This is very simple, but also very powerful! You define a starting point, and search for the closest matches of a desired table. When you add more connections your questions will stay the same, only the results will improve. If you like to see a real live example, here is a small code for a computer store. The network can be serialized through peewee to be stored on MySQL, PostgreSQL or SQLite.

Source::
https://github.com/livinter/MyNQL
Bug reports::
https://github.com/livinter/MyNQL/issues

Readme

MyNQL

MyNQL is a minimalistic graph database based on the Python library Networkx. Instead of replacing your relational database, it helps you to add a network with references to the data you already have.

  • Nodes have the format table.id
  • Connections (only) have a distance

You may already have tables like: Customers, merchants, products, places, areas, promotions, interests. Those tables used to have an id that together with the table name identify each entry.

After teaching the MyNQL network relations between two table1.id1 <-> table2.id2, you can ask the network also about all the indirect relations you like to know. A simple connect and select is all you need.

This is very simple, but also very powerful! You define a starting point, and search for the closest matches of a desired table. When you add more connections your questions will stay the same, only the results will improve. If you like to see a real live example, here is a small code for a computer store. The network can be serialized through peewee to be stored on MySQL, PostgreSQL or SQLite.

Install

MyNLQ’s source code hosted on GitHub.

git clone https://github.com/livinter/MyNQL.git
python setup.py install

or just

pip install MyNQL

Teach the Network

For example if a customer make a purchase of a product you assume a relation between customer.id and product.id, so you connect them. Optional you can specify a distance between nodes, to represent how close the nodes are related.

  • connect - connect two nodes
  • delete - delete a connection

Nodes are created automatically when you do the connection, and removed if they do not have any more connections. So do not worry about them.

Ask the Network

Now you can query all kinds of relations, not only the once you taught. With select you specify a starting point, like customer.id and specify the category where you like to know its closes relation.

  • select - gives you the best related nodes from a specified category

The searching query takes into account all the different routes up to a radius you specify.

Example

Lets imagine we already have a table customer

Id Name
101 jose
102 maria
103 juan

and you want to teach the network about recent purchases.

from MyNQL import MyNQL
mynql = MyNQL('store')

mynql.connect('customer.juan', 'product.jeans')
mynql.connect('customer.juan',  'product.socks')
mynql.connect('customer.maria', 'product.socks')

If the column Name is unique you can use it as a key, otherwise you would need column Id, and your code would look like this:

mynql.connect("customer.103', 'product.12')

Now you can ask questions from other points of view. You always specify a starting point, and the category where you want to know the best matches:

>>> mynql.select('customer.maria', 'product')
['socks', 'jeans']

Maria is more connected to socks, as she has a direct connection, but also a bit to jeans as there exist an indirect connection through Juan.

>>> mynql.select('product.jeans', 'product')
['socks']

Any combination is valid. For example you can ask about how one product is related to other.

Backend

Storage is done in memory, but if you want to use MySQL, SQLite or PostgreSQL as a backend take a look at test/pee_example.py. This will keep a copy of all updates in your database.

The “MyNQL” module

class MyNQL.MyNQL(db_name, serializer=None, log_file=None, log_level=40, backward_factor=0.5)

The MyNQL class log_level and log_file can be used to get debugging information to screen or to a logfile. For details regarding logging refer to the python lib logging. The serializer allow you to save the data into a database. See the pee_example for reference.

connect(nodes1, nodes2, distance=1.0, distance_backward=None, rewrite=False)

connect two nodes, if the relation already exist its closeness will be reduces. nodes are created if they do not exist.

>>> x = MyNQL("x").connect("table1.1","table2.3")
>>> x.G[("table1","1")][("table2","3")]
{'distance': 1.0}
>>> _ = x.connect("table1.1","table2.3")
>>> x.G[("table1","1")][("table2","3")]
{'distance': 0.5}
>>> _ = x.connect("table1.1","table2.3", rewrite=True)
>>> x.G[("table1","1")][("table2","3")]
{'distance': 1.0}
Parameters:
  • nodes1 (basestring) – this is a node as a tuple composed like (name/id, category)
  • nodes2 (basestring) – this is a node as a tuple composed like (name/id, category)
  • distance (float) – the closer the distance the more both nodes are related
  • distance_backward (float) – distance from node 2 to node 1
Returns:

None

delete(nodes1, nodes2)

delete a connection. if nodes do not have any neighbour anymore, nodes are also deleted.

>>> nql = MyNQL("x").connect("person.juan", "promo.promo1")
>>> nql = nql.delete("person.juan", "promo.promo1")
>>> nx.number_of_nodes(nql.G)
0
Parameters:
  • node1 – node 1
  • node2 – node 2
Returns:

None

get_categories()

all the categories that have been used so far.

>>> MyNQL("x").connect("person.juan", "promo.promo1").get_categories()
['person', 'promo']
Returns:list of categories
get_distance(node1, node2, radius=3.0)

select the relation between two nodes

Parameters:
  • node1 – node 1
  • node2 – node 2
Returns:

total distance as float

load(typ='gexf', path='')

load the complete network

Parameters:
  • typ – one of gmi, gexf, gpickle, graphml, yaml, node_link_data
  • path – location of network file
Returns:

None

load_serialized_node(key, json_node_data)

used to load network from database

Parameters:
  • key
  • json_node_data
Returns:

None

plot()

draw the graph using mathplotlib

Returns:None
save(typ='gexf', path='')

save network to disk

Parameters:
  • typ – one of gmi, gexf, gpickle, graphml, yaml, node_link_data
  • path – location to save file
Returns:

select(nodes_1, category, radius=3.0, in_order=True, limit=None, value_only=True)

select the most matching nodes of a specific category ordered by closeness to node1. if value_only is True only the IDs are returned otherwise the score as closeness comes with the tuple of the data. [(closeness, (node, id)),..] if no nodes are found and empty list is returned.

Parameters:
  • nodes_1 (str) – the starting node for calculating closeness
  • category (str) – the result is reduced to only elements from a specific category
  • radius (float) – reduce search radius to radius
  • in_order (bool) – sort output by having the best relation first
  • limit (int) – limit the amount of results to an number
  • value_only – only return the id, without score
Returns:

best matching nodes

Note

Note

At the current state it is more a library, and not a Query Language, but this should be ok for most use cases.

If you find any bugs, odd behavior, or have an idea for a new feature please don’t hesitate to open an issue on GitHub or contact me at livint at posteo dot de.