How to search on Ironsift

Our search engine will do its best to interpret your queries. For more specific needs, Ironsift provides a rich query language based on Apache Lucene and Elasticsearch. This section describes the syntax for advanced search queries.

Queries

Terms and Operators

Your query is analyzed into a series of terms and operators. Terms can be single words (e.g. patent or data) or phrases (e.g. "virtual currency"). Phrases must be surrounded by double quotes in order to search for all the words in the phrase in the same order.

All terms are required by default, e.g. a search for data mining will return any record that contains both data and mining. The standard Boolean operators AND, OR, and NOT are also supported. All operators are case-sensitive. NOT takes precedence over AND, which takes precedence over OR.

Fields

By default, various fields are searched for the query terms. To further refine your query, you can specify which field should be searched for each term in your query. For example, title:vulnerability will return records whose title contain vulnerability, but not records in which vulnerability only occurs in other fields.

The following table lists all searchable fields. The last column indicates whether a field is searched when no field is specified for a given term.

Field Description Searched by default
classification Standardized classifications. Possible values depend on a given record’s type, e.g. CPC is relevant for patents, but not for clinical trials. Yes
country ISO 3166-1 alpha-2 country codes. Yes
date Dates, formatted as long integers representing milliseconds elapsed since the epoch, or as ISO datetimes where the date is mandatory and the time is optional. Yes
description Freetext description. Yes
display-pid Pretty-printed IDs. Yes
keyword Keywords. Yes
links-to Domains and subdomains referenced in outbound links. Yes
party Party names, including agents, applicants, assignees, assignors, attorneys, authors, collaborators, complainants, contact persons, correspondents, counsels, examiners, filers, inventors, investigators, judges, respondents, sponsors, and vendors. Yes
party-cik Parties’ Central Index Key identifiers. Yes
party-lei Parties’ Legal Entity Identifiers. Yes
pids Standardized IDs for a given record. Yes
related Standardized IDs for related records. Yes
source Record sources, including FDA, IETF, NIH, NIST, USITC, and USPTO. Yes
title Freetext title. Yes
type Record types, including clinical trials, drug approvals, patent, patent application, patent assignment, request for comments, trademark, trademark application, trademark assignment, unfair import investigation, vendor statement, and vulnerability. Yes

Quotas

Ironsift uses a quota to ensure that users use the service as intended and do not unfairly reduce service quality for others. All topics incur at least a one-point quota cost. You can find the quota available to your accounts in your Profile page.

Default quota for unregistered visitors, which is subject to change, helps us optimize quota allocation and scale our infrastructure in a way that is more meaningful to our registered users.

Inactive topics do not count towards your quota. If you reach your quota limit, you can upgrade your plan, or navigate to your Topics page and deactivate some topics. These changes will take effect immediately.

Calculating quota usage

Ironsift calculates your quota usage by assigning a cost to each topic, and the cost is not the same for each topic. Two main factors influence a topic's quota cost:

  • Query complexity:
    • For literal terms and phrases, the cost is equal to the number of whitespace-separated words;
    • For wildcard queries and regular expressions, the cost is equal to the number of states required to build the corresponding finite-state automaton;
    • Operators do not count towards your quota;
  • Topic status: inactive topics do not count towards your quota.

There are a few more rules for edge cases and for other, less-frequent types of queries supported by Lucene. The actual quota cost will always be displayed when you create or edit a topic.

The rationale for complexity-based quotas is that an advanced, regular expression-based query may place a similar burden on our infrastructure as a group of simple, single-term queries, and our users should be free to design their queries as they see fit.