Provides an implementation for the {@link org.netbeans.mdr.persistence
abstract MDR persistence interfaces } in terms of {@link java.sql JDBC
}.
JDBC Persistence for MDR
The JdbcStorage implementation relies on a JDBC driver and associated
SQL DBMS for transactional persistence. The DBMS can be a
lightweight Java database running in the same process as MDR, a
heavyweight database running out-of-process on the same machine, or
even across the network (although performance may be poor in this
configuration).
Parameters
In order to use JdbcStorage, some mandatory properties must be
specified:
- org.netbeans.mdr.storagemodel.StorageFactoryClassName =
org.netbeans.mdr.persistence.jdbcimpl.JdbcStorageFactory
- MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.driverClassName
= the fully qualified name of the JDBC driver to use
(e.g. org.postgresql.Driver)
- MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.url = the
JDBC URL to use to connect to the database
(e.g. jdbc:postgresql://localhost/mdrdb)
- MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.schemaName
= the name of the schema which will contain the MDR tables; this
schema will be created automatically by JdbcStorage, so it should never
be created manually
There are also some optional connection parameters:
- MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.userName
= user name for connection; this user must have the right to
create/drop the schema and tables in the database (or just to read and
write their contents if they have all been created previously by
another user)
- MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.password =
password corresponding to userName
- MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.queryDuplicates
= true to force JdbcStorage to issue pre-updated queries to detect
duplicates in unique indexes (instead of letting the DBMS do the job
during update). This must be enabled for databases such as PostgreSQL
(pre-1.7.5) without proper subtransaction support. It should be left
disabled for all others since it adds a performance penalty.
- MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.firstSerialNumber
= lowest MOFID serial number to assign. When federating multiple
storages, this can be used to prevent conflicts by partitioning the
MOFID space into non-overlapping ranges. This parameter only takes
effect when the storage is initially created; the value can be either
decimal, hexadecimal, or octal as interpreted by {@link
java.lang.Long#decode}.
- MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.debugPrint
= true to request that whenever an SQLException is caught and
rethrown, the error message and its stack trace will be printed to
standard error. Sometimes exception messages get swallowed by upper
layers of MDR, so this can be useful for debugging configuration
problems. However, the output can be misleading, because some valid
exceptions thrown from the storage layer (e.g. uniqueness violation)
are caught and handled by MDR.
The next section defines additional parameters affecting
DBMS-specific usage of SQL.
Index to Table Mapping
JdbcStorage creates one table per MDR index. It also creates one
private table (MOFID_SEQ) for implementing the MOFID sequence
generator. No metadata tables are created; instead, JdbcStorage
relies on DBMS metadata. This avoids redundancy and the
possibility for inconsistency, and makes it easier to see the
relationship between MDR objects and their SQL implementations.
Datatypes for index key and value columns can be controlled via the
properties listed below. The defaults may not be supported by every
DBMS. For example, PostgreSQL requires usage of type BYTEA in place
of LONGVARBINARY. Also, users may wish to override the default
precision depending on the application.
-
MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.datatype.mofid:
default BIGINT
-
MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.datatype.streamable:
default LONGVARBINARY
-
MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.datatype.string:
default VARCHAR(2000)
-
MDRStorageProperty.org.netbeans.mdr.persistence.jdbcimpl.datatype.int:
default BIGINT
Additional columns may be created as follows (using the datatype
specified by the datatype.int property above):
-
For a MultivaluedOrderedIndex, a 0-based ordinal column is used to
implement ordering.
-
For a MultivaluedIndex with non-unique values, an additional surrogate
key column is used to ensure that each stored row has a unique key.
Surrogate key values are generated from the same sequence as MOFID
serial numbers.
The primary key is defined as the key column, plus the ordinal and
surrogate key columns (if any).
Table Names
If the DBMS does not support schemas, JdbcStorage fakes them by
prepending the specified schema name to the name of each table created
(e.g. MDR_MOFID_SEQ). In this case, a distinguishing value should be
used for the schema name, since JdbcStorage.delete() will delete all
tables whose names have this prefix.
Table names are derived from the index names requested by MDR.
However, the index name is not used directly, because MDR generates
very long index names based on MOFID's, and many DBMS products have a
low limit like 31 characters for identifiers. To shorten the name,
the StorageId portion of each MOFID is eliminated, and all leading zeros
of the serial number are trimmed.
Caching
{@link org.netbeans.mdr.persistence.jdbcimpl.JdbcPrimaryIndex} reuses
the streamable object caching strategy from {@link
org.netbeans.mdr.persistence.btreeimpl.btreestorage.MDRCache }. For
other indexes, no caching is currently performed.
Driver and DBMS Requirements
JdbcStorage has been tested with hsqldb 1.7.1 and PostgreSQL 7.4.1.
It does not require a sophisticated JDBC driver or DBMS, but may
require some tweaks for other products. Below is a list of the JDBC
and SQL features it uses:
-
CREATE TABLE with NOT NULL and PRIMARY KEY constraints.
-
Quoted identifiers for schema and table names, which may
include strange characters like "." and ":".
-
SELECT COUNT, DISTINCT and COUNT(DISTINCT). No joins, subqueries or
GROUP BY clauses are used. WHERE clauses are very simple (e.g. WHERE
COLUMN = value). ORDER BY is used for accessing
MultivaluedOrderedIndexes, and must support ordering by a non-selected
column.
-
INSERT/UPDATE/DELETE.
-
Prepared statements with dynamic parameters.
-
A single JDBC connection is shared by all calls to the repository, so
the JDBC driver must allow calls from any thread. However, access to
the connection is synchronized internally, so the JDBC driver does not
need to support concurrent threads of any kind.
-
The JDBC driver must support transactions. Since MDR only allows one
write transaction at a time, isolation level is irrelevant. Some
DBMS's don't support mixing DDL and DML within a transaction, and
autocommit on DDL instead. In that case, createIndex() will lead to
an implicit commit, which could result in inconsistencies after a
subsequent rollback.
-
The Storage.delete() method implementation depends on whether the DBMS
supports schemas. If schemas are not supported, the JDBC driver must
support DatabaseMetaData.getTables with a name prefix so they can be dropped
individually. If schemas are supported, the DBMS must also support
DROP SCHEMA CASCADE, and the JDBC driver must support
DatabaseMetaData.getSchemas.
-
ResultSetMetaData must support getColumnCount() and getColumnName().
-
ResultSet must support getInt(), getString(), getLong(), and
getBytes().
-
PreparedStatement must support setLong(), setBytes(), and setObject().
Limitations and Unresolved Issues
Since this is a first cut, there is lots of room for improvement:
-
The StorageId prefix "j" is used for all MOFID's, and all MOFID's
stored via JdbcStorage must have this prefix. MOFID's are stored as
integer serial numbers with no prefix. This makes JdbcStorage-based
federation difficult. For some applications, this may be acceptable,
and using integers for keys is generally good for SQL performance.
Probably we should allow the user a choice on a per-storage basis:
either store MOFID's as integers and disallow federation, or store
MOFID's as strings or byte arrays and allow federation (factoring out
some commonality from btreeimpl).
-
Caching for non-primary indexes is probably required for acceptable
performance in out-of-process configurations. A very bad case is when
values are inserted sequentially into a MultivaluedOrderedIndex
without using an iterator; each one requires an extra query in order
to generate an ordinal. A subquery in the VALUES clause would save a
round-trip, but caching would be better.
-
Using JDBC batched execution support for
flushing the cache could significantly improve performance for bulk
write operations such as import.
-
Each SQL implementation typically has its own subtle non-standard
behavior and performance, so a lot of testing and tweaking will be
required in order to make JdbcStorage plug-and-play.
-
Other than datatypes, there is no way to configure the mapping from
indexes to tables. For example, some users might wish to store all
associations in a single table, rather than creating one table per
association. And others might wish to store streamable objects in
normalized format for direct SQL querying. Some kind of configurable
object mapping rules would be nice.
-
The synchronization imposed for all JDBC access could become a
bottleneck in read-mostly multi-threaded apps. Loosening this is
tricky because different JDBC drivers have different multi-threading
rules (some require multiple connections).
-
There is no protection against multiple programs running private
repository instances connecting to a database stored on a shared
server, which would lead to corruption. One solution is to use
DBMS-supported locks to enforce exclusion; another is to turn this
into a feature (e.g. a DBMS supporting multi-versioning concurrency
control, such as Oracle, could permit multiple read-only connections
along with a single read/write connection).
In addition, there are various unresolved issues which came up during
development:
-
I did not implement keySet() and values() for JdbcPrimaryIndex because
in my testing I never saw them get called. Nor did I implement
queryByKeyPrefix() for any of the indexes, or fail-fast iterators anywhere.
-
I did not implement the classCode optimization from BtreeDatabase.
That should probably be factored out into a reusable class. For now,
I just serialized the class name and used Class.forName for
resurrection.
-
Currently, each call to JdbcStorageFactory.createStorage() results in
a JDBC connection being established and held open until a
corresponding call to JdbcStorage.shutDown(). Control over the
connection lifetime might be required by some applications, or even
advanced connection management such as pooling or transaction
coordination at the level of an application server.
-
JdbcStorage maintains on-demand maps of descriptors and JDBC
PreparedStatements for all indexes which have been accessed since the
storage was opened. These maps could lead to memory constipation for
very large repository models. Perhaps they need to be turned into
caches instead.