Skip to content

Introduce multikey indexes #769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TarantoolBot opened this issue May 7, 2019 · 2 comments
Closed

Introduce multikey indexes #769

TarantoolBot opened this issue May 7, 2019 · 2 comments
Assignees
Labels
feature A new functionality reference [location] Tarantool manual, Reference part server [area] Task relates to Tarantool's server (core) functionality user_guide [location] Tarantool manual, User's Guide part

Comments

@TarantoolBot
Copy link
Collaborator

TarantoolBot commented May 7, 2019

Any JSON index in which at least one partition contains "[*]" (array index placeholder) sign is called a "Multikey". Such indexes allow you to automatically index set of documents sharing the same document structure.

Multikey indexes design have a number of restrictions which must
be taken into account:

  • Since multikey indexes allow a tuple to appear multiple times, it is prohibited for such an indexes to be a primary key.
  • If a node in a JSON tree of all defined indexes contains an array index placeholder, no other JSON path can use an explicit JSON index on it's nested field.
  • It support "unique" semantics, but it's uniqueness a little different from conventional indexes: you may insert a tuple in which the same key occurs multiple times into a unique multikey index, but you cannot insert a tuple when any of its keys appear in other tuple stored in a space.
  • The unique multikey index "duplicate" conflict occurs when the sets of extracted keys have a non-empty logical intersection
  • To identify different keys by which a given datum in a tuple is indexed, each key is assigned a logical sequence number in an array defined with array index placeholder in index. Such array is called multikey index root
  • No index partition can contain more than one array index placeholder sign in it's JSON path,
  • All parts containing JSON paths with array index placeholder must have the same (in terms of JSON tokens) prefix before this placeholder sign.

Example 1:

s = box.schema.space.create('clients')
s:format({{name='name', type='string'}, {name='phone', type='array'}})
name_idx = s:create_index('name_idx', {parts = {{'name', 'string'}}})
phone_idx = s:create_index('phone_idx', {parts = {{'phone[*]',
'string'}}})
s:insert({"Jorge", {"911", "89457609234"}})
s:insert({"Bob", {"81239876543"}})

phone_idx:get("911")
---
- ['Jorge', ['911', '89457609234']]
...

Example 2:

s = box.schema.space.create('withdata')
pk = s:create_index('pk')
parts = {
	{2, 'str', path = 'data[*].name'},
        {2, 'str', path = 'data[*].extra.phone'}
}
idx = s:create_index('idx', {parts = parts})
s:insert({1, {data = {{name="A", extra={phone="111"}},
                      {name="B", extra={phone="111"}}},
             garbage = 1}}
idx:get({'A', '111'})

Requested by @kshcherbatov in tarantool/tarantool@f1d9f25.

@lenkis lenkis added 2.1 feature A new functionality reference [location] Tarantool manual, Reference part server [area] Task relates to Tarantool's server (core) functionality user_guide [location] Tarantool manual, User's Guide part labels May 13, 2019
@kyukhin kyukhin changed the title introduce multikey indexes in memtx introduce multikey indexes in memtx and vinyl May 14, 2019
@kyukhin kyukhin changed the title introduce multikey indexes in memtx and vinyl Introduce multikey indexes in memtx and vinyl May 14, 2019
@kyukhin kyukhin changed the title Introduce multikey indexes in memtx and vinyl Introduce multikey indexes May 14, 2019
pgulutzan added a commit that referenced this issue May 24, 2019
@pgulutzan
Copy link
Contributor

I did nothing about "conflict occurs when the sets of extracted keys have a non-empty logical intersection" because I didn't know what that means, and I did nothing about "each key is assigned a logical sequence number" because I couldn't see how that affects an application. I did use the name 'multikey' in an example but not in text because I don't suppose there's a desire to get this confused with another vendor's multikeys, which aren't closely similar.

@kshcherbatov
Copy link

Hi! I am not shure that this is correct:

As with :ref:Using the path option for map fields <box_space-path>, the field's value
must have the structure that the path definition implies, or be nil (nil is not indexed).

In fact, when not-nullabale part defines some path with multikey index placeholder, all referenced documents must contain corresponding leaf; otherwise the error is raised.
When such part is nullable, the documents that lack leaf by JSON path is inserted by "nil" value;

I mean:

s = box.schema.space.create('clients')
s:format({{name = "name", type = "str"}, {name = "data", type = "map"}})
pk = s:create_index('pk')
s:insert({'Adam', {phones = {{type = 'home', number = '333111'}}}})
s:insert({'James', {phones = {{type = 'home', number = '911'}, {number = '777777'}}}})

phone_number = s:create_index('phone_number', {parts = {{'data.phones[*]number', 'str'}}})
phone_type = s:create_index('phone_type', {unique = false, parts = {{'data.phones[*]type', 'str', is_nullable = true}}})

tarantool> phone_type:select()
---
- - ['James', {'phones': [{'type': 'home', 'number': '911'}, {'number': '777777'}]}]
  - ['Adam', {'phones': [{'type': 'home', 'number': '333111'}]}]
  - ['James', {'phones': [{'type': 'home', 'number': '911'}, {'number': '777777'}]}]
...

As you see, James is inserted in 'phone_type' index twice: by 'home' key and by an empty key.

Finally, index store only the last key occurrence/tuple:

s:truncate()
s:insert({'James2', {phones = {{type = 'home', number = '1911'}, {number = '1777777'}, {number = '999999'}}}})
tarantool> phone_type:select()
---
- - ['James2', {'phones': [{'type': 'home', 'number': '1911'}, {'number': '1777777'},
        {'number': '999999'}]}]
  - ['James2', {'phones': [{'type': 'home', 'number': '1911'}, {'number': '1777777'},
        {'number': '999999'}]}]
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality reference [location] Tarantool manual, Reference part server [area] Task relates to Tarantool's server (core) functionality user_guide [location] Tarantool manual, User's Guide part
Projects
None yet
Development

No branches or pull requests

4 participants