Welcome to MyNQL’s documentation!¶
MyNQL¶
MyNQL is a minimalistic graph database based on the Python library Networkx. Instead of replacing your relational database, it helps you to add a network with references to the data you already have.
- Nodes have the format
table.id
- Connections (only) have a
distance
You may already have tables like: Customers, merchants, products, places, areas, promotions, interests.
Those tables used to have an id
that together with the table name identify each entry.
After teaching the MyNQL network relations between two table1.id1
<-> table2.id2
,
you can ask the network also about all the indirect relations you like to know. A simple connect
and select
is all you need.
This is very simple, but also very powerful! You define a starting point, and search for the closest matches of a desired table. When you add more connections your questions will stay the same, only the results will improve. If you like to see a real live example, here is a small code for a computer store. The network can be serialized through peewee to be stored on MySQL, PostgreSQL or SQLite.
- Source::
- https://github.com/livinter/MyNQL
- Bug reports::
- https://github.com/livinter/MyNQL/issues
Readme¶
MyNQL¶
MyNQL is a minimalistic graph database based on the Python library Networkx. Instead of replacing your relational database, it helps you to add a network with references to the data you already have.
- Nodes have the format
table.id
- Connections (only) have a
distance
You may already have tables like: Customers, merchants, products, places, areas, promotions, interests.
Those tables used to have an id
that together with the table name identify each entry.
After teaching the MyNQL network relations between two table1.id1
<-> table2.id2
,
you can ask the network also about all the indirect relations you like to know. A simple connect
and select
is all you need.
This is very simple, but also very powerful! You define a starting point, and search for the closest matches of a desired table. When you add more connections your questions will stay the same, only the results will improve. If you like to see a real live example, here is a small code for a computer store. The network can be serialized through peewee to be stored on MySQL, PostgreSQL or SQLite.
Install¶
MyNLQ’s source code hosted on GitHub.
git clone https://github.com/livinter/MyNQL.git
python setup.py install
or just
pip install MyNQL
Teach the Network¶
For example if a customer make a purchase of a product you assume a relation between customer.id
and product.id
,
so you connect them. Optional you can specify a distance between nodes, to represent how close the nodes are related.
connect
- connect two nodesdelete
- delete a connection
Nodes are created automatically when you do the connection, and removed if they do not have any more connections. So do not worry about them.
Ask the Network¶
Now you can query all kinds of relations, not only the once you taught. With select you specify a starting point, like
customer.id
and specify the category where you like to know its closes relation.
select
- gives you the best related nodes from a specified category
The searching query takes into account all the different routes up to a radius you specify.
Example¶
Lets imagine we already have a table customer
Id | Name | |
---|---|---|
101 | jose | … |
102 | maria | … |
103 | juan | … |
and you want to teach the network about recent purchases.
from MyNQL import MyNQL
mynql = MyNQL('store')
mynql.connect('customer.juan', 'product.jeans')
mynql.connect('customer.juan', 'product.socks')
mynql.connect('customer.maria', 'product.socks')
If the column Name
is unique you can use it as a key, otherwise you would need column Id
, and your code would look like this:
mynql.connect("customer.103', 'product.12')
Now you can ask questions from other points of view. You always specify a starting point, and the category where you want to know the best matches:
>>> mynql.select('customer.maria', 'product')
['socks', 'jeans']
Maria is more connected to socks
, as she has a direct connection, but also a bit to jeans
as there exist an indirect connection through Juan.
>>> mynql.select('product.jeans', 'product')
['socks']
Any combination is valid. For example you can ask about how one product is related to other.
Backend¶
Storage is done in memory, but if you want to use MySQL, SQLite or PostgreSQL as a backend take a look at test/pee_example.py
.
This will keep a copy of all updates in your database.
The “MyNQL” module¶
-
class
MyNQL.
MyNQL
(db_name, serializer=None, log_file=None, log_level=40, backward_factor=0.5)¶ The MyNQL class log_level and log_file can be used to get debugging information to screen or to a logfile. For details regarding logging refer to the python lib logging. The serializer allow you to save the data into a database. See the pee_example for reference.
-
connect
(nodes1, nodes2, distance=1.0, distance_backward=None, rewrite=False)¶ connect two nodes, if the relation already exist its closeness will be reduces. nodes are created if they do not exist.
>>> x = MyNQL("x").connect("table1.1","table2.3") >>> x.G[("table1","1")][("table2","3")] {'distance': 1.0} >>> _ = x.connect("table1.1","table2.3") >>> x.G[("table1","1")][("table2","3")] {'distance': 0.5} >>> _ = x.connect("table1.1","table2.3", rewrite=True) >>> x.G[("table1","1")][("table2","3")] {'distance': 1.0}
Parameters: - nodes1 (basestring) – this is a node as a tuple composed like (name/id, category)
- nodes2 (basestring) – this is a node as a tuple composed like (name/id, category)
- distance (float) – the closer the distance the more both nodes are related
- distance_backward (float) – distance from node 2 to node 1
Returns: None
-
delete
(nodes1, nodes2)¶ delete a connection. if nodes do not have any neighbour anymore, nodes are also deleted.
>>> nql = MyNQL("x").connect("person.juan", "promo.promo1") >>> nql = nql.delete("person.juan", "promo.promo1") >>> nx.number_of_nodes(nql.G) 0
Parameters: - node1 – node 1
- node2 – node 2
Returns: None
-
get_categories
()¶ all the categories that have been used so far.
>>> MyNQL("x").connect("person.juan", "promo.promo1").get_categories() ['person', 'promo']
Returns: list of categories
-
get_distance
(node1, node2, radius=3.0)¶ select the relation between two nodes
Parameters: - node1 – node 1
- node2 – node 2
Returns: total distance as float
-
load
(typ='gexf', path='')¶ load the complete network
Parameters: - typ – one of gmi, gexf, gpickle, graphml, yaml, node_link_data
- path – location of network file
Returns: None
-
load_serialized_node
(key, json_node_data)¶ used to load network from database
Parameters: - key –
- json_node_data –
Returns: None
-
plot
()¶ draw the graph using mathplotlib
Returns: None
-
save
(typ='gexf', path='')¶ save network to disk
Parameters: - typ – one of gmi, gexf, gpickle, graphml, yaml, node_link_data
- path – location to save file
Returns:
-
select
(nodes_1, category, radius=3.0, in_order=True, limit=None, value_only=True)¶ select the most matching nodes of a specific category ordered by closeness to node1. if value_only is True only the IDs are returned otherwise the score as closeness comes with the tuple of the data. [(closeness, (node, id)),..] if no nodes are found and empty list is returned.
Parameters: - nodes_1 (str) – the starting node for calculating closeness
- category (str) – the result is reduced to only elements from a specific category
- radius (float) – reduce search radius to radius
- in_order (bool) – sort output by having the best relation first
- limit (int) – limit the amount of results to an number
- value_only – only return the id, without score
Returns: best matching nodes
-
Note¶
Note
At the current state it is more a library, and not a Query Language, but this should be ok for most use cases.
If you find any bugs, odd behavior, or have an idea for a new feature please don’t hesitate to open an issue on GitHub or contact me at livint at posteo dot de.