Google Code offered in: English - Español - 日本語 - 한국어 - Português - Pусский - 中文(简体) - 中文(繁體)
Every datastore query uses an index, a table that contains the results for the query in the desired order. An App Engine application defines its indexes in a configuration file named index.yaml
. The development web server automatically adds suggestions to this file as it encounters queries that do not yet have indexes configured. You can tune indexes manually by editing the file before uploading the application.
The index-based query mechanism supports most common kinds of queries, but it does not support some queries you may be used to from other database technologies. Restrictions on queries, and their explanations, are described below.
A query retrieves entities from the datastore that meet a set of conditions. The query specifies an entity kind, zero or more conditions based on entity property values (sometimes called "filters"), and zero or more sort order descriptions. When the query is executed, it fetches all entities of the given kind that meet all of the given conditions, sorted in the order described.
A query can also return just the keys of the result entities instead of the entities themselves.
The datastore Python API provides two interfaces for preparing and executing queries: the Query interface, which uses methods to prepare the query, and the GqlQuery interface, which uses a SQL-like query language called GQL to prepare the query from a query string. These interfaces are described in more detail in Creating, Getting and Deleting Data: Getting Entities Using a Query and the corresponding reference pages.
class Person(db.Model): first_name = db.StringProperty() last_name = db.StringProperty() city = db.StringProperty() birth_year = db.IntegerProperty() height = db.IntegerProperty() # The Query interface prepares a query using instance methods. q = Person.all() q.filter("last_name =", "Smith") q.filter("height <", 72) q.order("-height") # The GqlQuery interface prepares a query using a GQL query string. q = db.GqlQuery("SELECT * FROM Person " + "WHERE last_name = :1 AND height < :2 " + "ORDER BY height DESC", "Smith", 72) # The query is not executed until results are accessed. results = q.fetch(5) for p in results: print "%s %s, %d inches tall" % (p.first_name, p.last_name, p.height)
A filter includes a property name, a comparison operator, and a value. An entity passes the filter if it has a property of the given name and its value compares to the given value as described by the operator. The entity is a result for the query if it passes all of its filters.
The filter operator can be any of the following:
<
less than<=
less than or equal to=
equal to>
greater than>=
greater than or equal to!=
not equal toIN
equal to any of the values in the provided listThe !=
operator actually performs 2 queries: one where all other filters are the same and the not-equal filter is replaced with a less-than filter, and one where the not-equal filter is replaces with a greater-than filter. The results are merged, in order. As described below in the discussion of inequality filters, a query can only have one not-equal filter, and such a query cannot have other inequality filters.
The IN
operator also performs multiple queries, one for each item in the provided list value where all other filters are the same and the IN
filter is replaces with an equal-to filter. The results are merged, in the order of the items in the list. If a query has more than IN
filter, the query is performed as multiple queries, one for each combination of values in the IN
filters.
A single query containing !=
and IN
operators is limited to 30 sub-queries.
The App Engine datastore maintains an index for every query an application intends to make. As the application makes changes to datastore entities, the datastore updates the indexes with the correct results. When the application executes a query, the datastore fetches the results directly from the corresponding index.
An application has an index for each combination of kind, filter property and operator, and sort order used in a query. Consider the example query, stated in GQL:
SELECT * FROM Person WHERE last_name = "Smith" AND height < 72 ORDER BY height DESC
The index for this query is a table of keys for entities of the kind Person
, with columns for the values of the height
and last_name
properties. The index is sorted by height
in descending order.
Two queries of the same form but with different filter values use the same index. For example, the following query uses the same index as the query above:
SELECT * FROM Person WHERE last_name = "Jones" AND height < 63 ORDER BY height DESC
The datastore executes a query using the following steps:
An index table contains columns for every property used in a filter or sort order. The rows are sorted by the following aspects, in order:
Note: For the purposes of indexes, IN
filters are handled like =
filters, and !=
filters are handled like the other inequality filters.
This puts all results for every possible query that uses this index in consecutive rows in the table.
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2", "abc", u"abc" + u"\ufffd")
This matches every MyModel
entity with a string property prop
that begins with the characters abc
. The unicode string u"\ufffd"
represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
This mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of queries you may be used to from other database technologies.
An index only contains entities that have every property referred to by the index. If an entity does not have a property referred to by an index, the entity will not appear in the index, and will never be a result for the query that uses the index.
Note that the App Engine datastore makes a distinction between an entity that does not possess a property and an entity that possesses the property with a null value (None
). If you want every entity of a kind to be a potential result for a query, you can use a data model that assigns a default value (such as None
) to properties used by query filters.
Property values that aren't indexed are not findable by queries. This includes properties that are marked as not indexed, as well as properties with values of the long text value type (Text) or the long binary value type (Blob).
A query with a filter or sort order on a property will never match an entity whose value for the property is a Text or Blob, or which was written with that property marked as not indexed. Properties with such values behave as if the property is not set with regard to query filters and sort orders.
When two entities have properties of the same name but of different value types, an index of the property sorts the entities first by value type, then by an order appropriate to the type. For example, if two entities each have a property named "age," one with an integer value and one with a string value, the entity with the integer value will always appear before the entity with the string value when sorted by the "Age" property, regardless of the values themselves.
This is especially worth noting in the case of integers and floating point numbers, which are treated as separate types by the datastore. A property with the integer value 38
is sorted before a property with the floating point value 37.5
, because all integers are sorted before floats.
App Engine builds indexes for several simple queries by default. For other queries, the application must specify the indexes it needs in a configuration file named index.yaml
. If the application running under App Engine tries to perform a query for which there is no corresponding index (either provided by default or described in index.yaml
), the query will fail.
App Engine provides automatic indexes for the following forms of queries:
Other forms of queries require their indexes to be specified in index.yaml
, including:
The development web server makes managing index configuration easy: Instead of failing to execute a query that does not have an index and requires it, the development web server can generate configuration for an index that would allow the query to succeed. If your local testing of your application calls every possible query the application will make (every combination of kind, ancestor, filter and sort order), the generated entries will represent a complete set of indexes. If your testing might not exercise every possible query form, you can review and adjust the index configuration before uploading the application.
App Engine builds indexes for several simple queries by default. For other queries, the application must specify the indexes it needs in a configuration file named index.yaml
. If the application running under App Engine tries to perform a query for which there is no corresponding index (either provided by default or described in index.yaml
), the query will fail.
index.yaml
describes each index table, including the kind, the properties needed for the query filters and sort orders, and whether or not the query uses an ancestor clause (either Query.ancestor() or a GQL ANCESTOR IS clause). The properties are listed in the order they are to be sorted: properties used in equality or IN
filters first, followed by the property used in inequality filters, then the query results sort orders and their directions.
Consider once again the following example query:
SELECT * FROM Person WHERE last_name = "Smith" AND height < 72 ORDER BY height DESC
If the application executed only this query (and possibly other queries similar to this one but with different values for "Smith"
and 72
), the index.yaml
file would look like this:
indexes: - kind: Person properties: - name: last_name - name: height direction: desc
When an entity is created or updated, every appropriate index is updated as well. The number of indexes that apply to an entity affects the time it takes to create or update the entity.
For more information on the syntax of index.yaml
, see Configuring Indexes.
Entity keys can be the subject of a query filter or sort order, using the special name __key__
in place of the property name. The datastore considers the complete key value for such queries, including the entity's parent path, the kind, and the app-assigned key name string or system-assigned numeric ID.
A query can return entity keys instead of full entities. You can trigger this behavior by passing keys_only=True
to Query, or by using SELECT __key__
in GQL.
Note: Queries that return keys are faster and cost less CPU than queries that return entities, since the keys themselves are already in the index, so the query doesn't need to fetch the actual entities. If you only need the keys from your query results — for example, if you're just going to delete the results — consider using a keys only query.
Because an entity key is unique across all entities in the system, __key__
queries make it easy to retrieve entities of a given kind in batches, such as for a batch dump of the contents of the datastore. Unlike offset, this works efficiently for any number of entities. For example:
class MainHandler(webapp.RequestHandler): def get(self): query = Entity.gql('ORDER BY __key__') # Use a query parameter to keep track of the last key of the last # batch, to know where to start the next batch. last_key_str = self.request.get('last') if last_key_str: last_key = db.Key(last_key_str) query = Entity.gql('WHERE __key__ > :1 ORDER BY __key__', last_key) # For batches of 20, fetch 21, then use result #20 as the "last" # if there is a 21st. entities = query.fetch(21) new_last_key_str = None if len(entities) == 21: new_last_key_str = str(entities[19].key()) # Return the data and new_last_key_str. Client would use # http://...?last=new_last_key_str to fetch the next batch. # ...
Keys are ordered first by parent path, then by kind, then by key name or ID. Kinds and key names are strings and are ordered by byte value. IDs are integers and are ordered numerically. If entities of the same parent and kind use a mix of key name strings and numeric IDs, entities with numeric IDs are considered to be less than entities with key name strings. Elements of the parent path are compared similarly: by kind (string), then by key name (string) or ID (number).
Queries involving keys use indexes just like queries involving properties, with one minor difference: Unlike with a property, a query with an equality filter on the key that also has additional filters must use a custom index defined in the app's index configuration file. As with all queries, the development web server creates appropriate configuration entries in this file when a query that needs a custom index is tested.
The nature of the index query mechanism imposes a few restrictions on what a query can do.
A query filter condition or sort order for a property also implies a condition that the entity have a value for the property.
A datastore entity is not required to have a value for a property that other entities of the same kind have. A filter on a property can only match an entity with a value for the property. Entities without a value for a property used in a filter or sort order are omitted from the index built for the query.
It is not possible to perform a query for entities that are missing a given property. One alternative is to create a fixed (modeled) property with a default value of None
, then create a filter for entities with None
as the property value.
A query may only use inequality filters (<
, <=
, >=
, >
, !=
) on one property across all of its filters.
For example, this query is allowed:
SELECT * FROM Person WHERE birth_year >= :min AND birth_year <= :max
However, this query is not allowed, because it uses inequality filters on two different properties in the same query:
SELECT * FROM Person WHERE birth_year >= :min_year AND height >= :min_height # ERROR
Filters can combine equal (=
) comparisons for different properties in the same query, including queries with one or more inequality conditions on a property. This is allowed:
SELECT * FROM Person WHERE last_name = :last_name AND city = :city AND birth_year >= :min_year
The query mechanism relies on all results for a query to be adjacent to one another in the index table, to avoid having to scan the entire table for results. A single index table cannot represent multiple inequality filters on multiple properties while maintaining that all results are consecutive in the table.
If a query has both a filter with an inequality comparison and one or more sort orders, the query must include a sort order for the property used in the inequality, and the sort order must appear before sort orders on other properties.
This query is not valid, because it uses an inequality filter and does not order by the filtered property:
SELECT * FROM Person WHERE birth_year >= :min_year ORDER BY last_name # ERROR
Similarly, this query is not valid because it does not order by the filtered property before ordering by other properties:
SELECT * FROM Person WHERE birth_year >= :min_year ORDER BY last_name, birth_year # ERROR
This query is valid:
SELECT * FROM Person WHERE birth_year >= :min_year ORDER BY birth_year, last_name
To get all results that match an inequality filter, a query scans the index table for the first matching row, then returns all consecutive results until it finds a row that doesn't match. For the consecutive rows to represent the complete result set, the rows must be ordered by the inequality filter before other sort orders.
Due to the way properties with multiple values are indexed, the sort order for these properties is unusual:
This sort order has the unusual consequence that [1, 9]
comes before [4, 5, 6, 7]
in both ascending and descending order.
One important caveat is queries with both an equality filter and a sort order on a multi-valued property. In those queries, the sort order is disregarded. For single-valued properties, this is a simple optimization. Every result would have the same value for the property, so the results do not need to be sorted further.
However, multi-valued properties may have additional values. Since the sort order is disregarded, the query results may be returned in a different order than if the sort order were applied. (Restoring the dropped sort order would be expensive and require extra indices, and this use case is rare, so the query planner leaves it off.)
As described above, every property (that doesn't have a Text or Blob value) of every entity is added to at least one index table, including a simple index provided by default, and any indexes described in the application's index.yaml
file that refer to the property. For an entity that has one value for each property, App Engine stores a property value once in its simple index, and once for each time the property is referred to in a custom index. Each of these index entries must be updated every time the value of the property changes, so the more indexes that refer to the property, the more time it will take to update the property.
To prevent the update of an entity from taking too long, the datastore limits the number of index entries that a single entity can have. The limit is large, and most applications will not notice. However, there are some circumstances where you might encounter the limit. For example, an entity with very many single-value properties can exceed the index entry limit.
Properties with multiple values store each value as a separate entry in an index. An entity with a single property with very many values can exceed the index entry limit.
Custom indexes that refer to multiple properties with multiple values can get very large with only a few values. To completely record such properties, the index table must include a row for every permutation of the values of every property for the index.
For example, the following index (described in index.yaml
syntax) includes the x
and y
properties for entities of the kind MyModel
:
indexes: - kind: MyModel properties: - name: x - name: y
The following code creates an entity with 2 values for the property x
and 2 values for the property y
:
class MyModel(db.Expando): pass e2 = MyModel() e2.x = ['red', 'blue'] e2.y = [1, 2] e2.put()
To accurately represent these values, the index must store 12 property values: 2 each for the built-in indexes on x
and y
, and 2 for each of the 4 permutations of x
and y
in the custom index. With many values of multi-valued properties, this can mean an index must store very many index entries for a single entity. You could call an index that refers to multiple properties with multiple values an "exploding index," because it can get very large with just a few values.
If a put()
would result in a number of index entries that exceeds the limit, the call will fail with an exception. If you create a new index that would contain a number of index entries that exceeds the limit for any entity when built, queries against the index will fail, and the index will appear in the "Error" state in the Admin Console.
To handle "Error" indexes, first remove them from your index.yaml
file and run appcfg.py vacuum_indexes
. Then, either reformulate the index definition and corresponding queries or remove the entities that are causing the index to "explode." Finally, add the index back to index.yaml
and run appcfg.py update_indexes
.
You can avoid exploding indexes by avoiding queries that would require a custom index using a list property. As described above, this includes queries with multiple sort orders, a mix of equality and inequality filters, and ancestor filters.