ویژگی ساسی SASI در کاساندرا به منظور Full Search

نمایه گذاری ساسی (SASI): از نسخه 3.4، می­توان از پیاده­ سازی جدیدی از اندیس­های ثانویه SSTable Attached Secondary Index (SASI) بهره برد. برای ستون­هایی که توسط پیاده­ سازی ساسی (SASI)، اندیس ثانویه می­شوند، می­توان در پرس­وجوها از عملگرهای نامساوی (پرس­وجوی محدوده­ای از مقادیر) و LIKE (مانند SQL) استفاده کرد. همچنین در این نوع پیاده­ سازی، در پرسوجوهایی که نیاز به پالایش دارند (Allow filtering)، کارایی قابل توجهی حاصل می­شود. در این پیاده ­سازی تا به امروز (نسخه 3.9) نمی­توان Collection ها را اندیس نمود. سه روش متفاوت برای این نوع ایندکس در کاساندرا ارائه شده است که عناوین آنها به این ترتیب است.

  • حالت PREFIX index
  • حالت CONTAINS index
  • حالت SPARSE index
ساسی (SASI)
SSTable Attached Secondary Index

من دیگر حرفی ندارم… با اضافه شدن این قابلیت به کاساندرا اولین چیزی که به ذهن می آید این است که بزودی کاساندرا هم میخواهد جای فول تکست سرچرها را بگیرد و هم میخواهد جایگزین بانک های اطلاعاتی رابطه ای بشود و هم میخواهد به عنوان پایگاه داده های سندگرا ایفای نقش کند. فعلا وقت ندارم ترجمه کنم ولی خودتان یک نگاه به منبع آخر مطلب بکنید میتوانید متوجه بشوید چه خبر است.

Using a SSTable Attached Secondary Index (SASI)
Secondary indexes, SSTable Attached Secondary Indexes (SASI), have improved the performance of secondary indexes but should be used with caution.
Note: SASI indexes in DSE are experimental. DataStax does not support SASI indexes for production.
Using CQL, SSTable attached secondary indexes (SASI) can be created on a non-collection column defined in a table. Secondary indexes are used to query a table that uses a column that is not normally queryable, such as a non primary key column. SASI implements three types of indexes, PREFIX, CONTAINS, and SPARSE.

حالت PREFIX index:

ساسی (SASI)

Queries can find exact matches for values in firstname. Note that indexing is used for this query, as the primary key id is not specified.

ساسی (SASI)

Queries can find matches for values in firstname based on partial matches. The use of LIKE specifies that the match is looking for a word that starts with the letter “M”. The % after the letter “M” will match any characters can return a matching value. Note that indexing is used for this query, as the primary key id is not specified.

ساسی (SASI)

تمامی پرسوجو های زیر با خطا مواجه خواهند شد.

The first four queries fail because of case sensitivity. “MARIANNE” is all uppercase, whereas the stored value is not. The next three use a lowercase “m”. The placement of the % are critical; since the index specifies the PREFIX mode, only a trailing % will yield results when coupled with LIKE. The queries with equalities fail unless the exact match is designated.

حالت CONTAINS index:

Create an index fn_suffix for the table cyclist_name on the column firstname. CONTAINS is the specified mode, so that pattern matching for partial patterns given, not just in the prefix.

Queries can find exact matches for values in firstname. Note that indexing is used for this query, as the primary key id is not specified. For queries on CONTAINS indexing, the ALLOW FILTERING phrase must be included, although the database will not actually filter.

ساسی (SASI)

This query returns the same results as a query using PREFIX indexing that does an exact match using a slightly modified query.
Queries can find matches for values in firstname based on partial matches. The use of LIKE specifies that the match is looking for a word that contains the letter “M”. The % before and after the letter “M” will match any characters can return a matching value. Note that indexing is used for this query, as the primary key id is not specified.

ساسی (SASI)

Again, the same results are returned as for the PREFIX indexing, using a slightly modified query.
The CONTAINS indexing has a more versatile matching algorithm than PREFIX. Look at the examples below to see what results from variations of the last search.

ساسی (SASI)

Each query matches the pattern, either the final characters of the column value as in %arianne or the characters bracketed by % such as %arian%.
With CONTAINS indexing, even inequality pattern matching is possible. Note again the use of the ALLOW FILTERING phrase that required but causes no latency in the query response.

ساسی (SASI)

The only row matching the conditions returns the same value as the last query.

Like with PREFIX indexing, many queries will fail to find matches based on the partial string.

تمامی پرسوجو های زیر با خطا مواجه خواهند شد.

The first query fails due to the absence of the ALLOW FILTERING phrase. The next two queries fail because of case sensitivity. “MariAnne” has one uppercase letter, whereas the stored value does not. The last three fail due to placement of the %.

Either the PREFIX index or the CONTAINS index can be created with case sensitivity by adding an analyzer class and case_sensitive option.

The analyzer_class used here is the non-tokenizing analyzer that does not perform analysis on the text in the specified column. The option case_sensitive is set to false to make the indexing case insensitive.
With the addition of the analyzer class and option, the following query now also works, using a lowercase “m”.

d8a7d8b6d8a7d981d987 d8b4d8afd986 d988db8cda98daafdb8c d8b3d8a7d8b3db8c sasi d8afd8b1 daa9d8a7d8b3d8a7d986d8afd8b1d8a7 d8a8d987 d985d986 2

If queries are narrowed with an indexed column value, non-indexed columns can be specified. Compound queries can also be created with multiple indexed columns. This example alters the table to add a column age that is not indexed before performing the query.

ساسی (SASI)

حالت SPARSE index:

The SPARSE index is meant to improve performance of querying large, dense number ranges like timestamps for data inserted every millisecond. If the data is numeric, millions of columns values with a small number of partition keys characterize the data, and range queries will be performed against the index, then SPARSE is the best choice. For numeric data that does not meet this criteria, PREFIX is the best choice.

Use SPARSE indexing for data that is sparse (every term/column value has less than 5 matching keys). Indexing the created_at field in time series data (where there is typically few matching rows/events per created_at timestamp) is a good use case. SPARSE indexing is primarily an optimization for range queries, especially large ranges that span large timespans.

To illustrate the use of the SPARSE index, create a table and insert some time series data:

ساسی (SASI)

Find all the comments made before the timestamp 2013-01-01 00:05:01.500.

d8a7d8b6d8a7d981d987 d8b4d8afd986 d988db8cda98daafdb8c d8b3d8a7d8b3db8c sasi d8afd8b1 daa9d8a7d8b3d8a7d986d8afd8b1d8a7 d8a8d987 d985d986 7

This query returns all the results where created_at is found to be less than the timestamp supplied. The inequalities >=, > and <= are all valid operators.
SPARSE indexing is used only for numeric data, so LIKE queries do not apply.

Using analyzers

Analyzers can be specified that will analyze the text in the specified column. The NonTokenizingAnalyzer is used for cases where the text is not analyzed, but case normalization or sensitivity is required. The StandardAnalyzer is used for analysis that involves stemming, case normalization, case sensitivity, skipping common words like “and” and “the”, and localization of the language used to complete the analysis. Altering the table again to add a lengthier text column provides a window into the analysis.

This query will search for the presence of a designated string, using the analyzed text to return a result.

d8a7d8b6d8a7d981d987 d8b4d8afd986 d988db8cda98daafdb8c d8b3d8a7d8b3db8c sasi d8afd8b1 daa9d8a7d8b3d8a7d986d8afd8b1d8a7 d8a8d987 d985d986 8

This query returns all the results where ride is found either as an exact word or as a stem for another word – rides in this case.

منبع:

http://pp1230.github.io/2015/12/26/cassandra-lucene-index.html

https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html

Stratio’s Lucene-based index for Cassandra is now a plugin

مدیریت سرور پشتیبانی و مشاوره – ثبت دامنه

نوشته های مشابه