Zope Subversion Repository |
|
- Storage servers now emit Serving and Closed events so subscribers can discover addresses when dynamic port assignment (bind to port 0) is used. This could, for example, be used to update address information in a ZooKeeper database. - Client storagers have a method, new_addr, that can be used to change the server address(es). This can be used, for example, to update a dynamically determined server address from information in a ZooKeeper database. - Moved some responsibility from runzeo to StorageServer to make it easier to use storage servers without runzeo.
Lots of code cleanups. https://bugs.launchpad.net/zodb/+bug/677751
Provide shorter code path for loads, which are most common operation. Simplified and optimized marshalling code.
Updated ZEO protocol version to reflect addition of checkCurrentSerialInTransaction. Add logic to allow new clients to be used with older servers.
Updated comment for server protocol.
Comply with repository policy.
Gave delay objects a repr for debugging purposes.
Fixed a threading bug.
The storage server is now multi-threaded.
Refactored storage server to support multiple client threads. Changed ZEO undo protocol. (Undo is disabled with older clients.) Now use one-way undoa. Undone oids are now returned by (tpc_)vote for ZEO. Undo no-longer gets commit lock.
ZEO clients (``ClientStorage`` instances) now work in forked processes, including those created via ``multiprocessing.Process`` instances. This entailed giving each client storage it's own networking thread.
Refactored the zrpc implementation to: - Most server methods now return data to clients more quickly by writing to client sockets immediately, rather than waiting for the asyncore select loop to get around to it. - More clearly define client and server responsibilities. Machinery needed for just clients or just servers has been moved to the corresponding connection subclasses. - Degeneralized "flags" argument to many methods. There's just one async flag.
Turn off debug logging. It's waaaay too expensive. But make it not too hard to turn back on when it's needed, although, at that poiint, it still might not be enough. :)
Merged the chrisw-error_logging branch: Bug fixed: - Internal ZEO errors were logged at the INFO level, rather than at the error level.
Fixed atexit handler to deal with the possibility that the ex
Bug Fixed: ZEO manages a separate thread for client network IO. It created this thread on import, which caused problems for applications that implemented daemon behavior by forking. Now, the client thread isn't created until needed.
Bugs Fixed: - Fixed vulnerabilities in the ZEO network protocol that allow: CVE-2009-0668 Arbitrary Python code execution in ZODB ZEO storage servers CVE-2009-0669 Authentication bypass in ZODB ZEO storage servers - Limit the number of object ids that can be allocated at once to avoid running out of memory.
Bugs Fixed ---------- - ZEO client threads were unnamed, making it hard to debug thread management.
POSKeyErrors are really client errors
Cleaned up the Z309 ZEO protocol, removing versions from arguments and return values. This in turn simplified the client and server software. Added code to select different client and server stubs and input handlers depening on whether the Z309 or earlier protocols are used. ZODB 3.8 clients can now talk to ZODB 3.9 servers and the other way around.
Upped the protocol number, since new server methods were added.
Fixed a possible problem with management of server connection triggers. Now that server triggers are shared, it makes no sense to close them. It's possible that the old logic in _pull_trigger got around the potential problem intriduced when I made the server trigger shared. I can't think of a good reason, otherwise, why tests weren't failing. Getting rid of close trigger simplified the code a bit. Also factored some common close behavior, allowing me to get rid of an override.
Refactored cache verification to fix threading bugs during connection. Changed connections to work with unset (None) clients. Messages aren't forwarded until the client is set. This is to prevent sending spurious invalidation messages until a client is ready to recieve them.
Bug Fixed: Improved the the ZEO client shutdown support to try to avoid spurious errors on exit, especially for scripts, such as zeopack.
Fixed a trigger leak, introduced when I removed ThreadedAsync, that caused an unneeded trigger to be created for each client connection. This caused tests hang due to running out of file handles. Let all server connections share a single trigger to avoid using too many file handles in the server.
Fixed typo.
Removed ThreadedAsync and (last?) vestiges of the old "non-async" mode.
Fixed a serious bug that could cause client I/O to stop (hang). This was accomonied by a critical log message along the lines of: "RuntimeError: dictionary changed size during iteration".
Updated to reflect differences in exception meta types ebtween Python 2.4 and 2.5.
Now require Blob files to be stored even for unopened blobs.
- fixed typos
Added support for message iterators. This allows one, for example, to use an iterator to send a large file without loading it in memory. Updated the ZEO protocol to reflect the new Blob-support methods.
Removed a needless timeout to a condition wait call. Using timeouts can cause signidficant delays, especially on systems with very course-grained sleeps, like most linux systems. This change makes the ZEO tests run about 25% faster on an Ubuntu desktop system. We suspect the production impact to be much greater, at least on some systems. Removed some non-async code, now that we no-longer have a non-async mode. (I cowardly left an assert behind to make sure.:)
Added logic to avoid spurious errors from the logging system on exit.
Tried to make management of the client loop more robust and added a test for it.
Removed the "sync" mode for ClientStorage. Previously, a ClientStorage could be in either "sync" mode or "async" mode. Now there is just "async" mode. There is now a dedicicated asyncore main loop dedicated to ZEO clients. This addresses a test failure on Mac OS X, http://www.zope.org/Collectors/Zope3-dev/650, that I believe was due to a bug in sync mode. Some asyncore-based code was being called from multiple threads that didn't expect to be. Converting to always-async mode revealed some bugs that weren't caught before because the tests ran in sync mode. These problems could explain some problems we've seen at times with clients taking a long time to reconnect after a disconnect. Added a partial heart beat to try to detect lost connections that aren't otherwise caught, http://mail.zope.org/pipermail/zodb-dev/2005-June/008951.html, by perioidically writing to all connections during periods of inactivity.
Merge rev 38747 from 3.4 branch. Port from 2.7 branch. Collector 1900. send_reply(), return_error(): Stop trying to catch an exception that doesn't exist, when marshal.encode() raises an exception. Jeremy simplified the marshal.encode() half of this about 3 years ago, but apparently forgot to change ZEO/zrpc/connection.py to match.
Merge rev 29769 from 3.3 branch. Rewrite ZEO protocol negotiation. 3.3 should have bumped the ZEO protocol number (new methods were added for MVCC support), but didn't. Untangling this is a mess.
Convert some XXXs. More to come.
Port rev 29092 from 3.3 branch. Forward port from ZODB 3.2. Connection.__init__(): Python 2.4 added a new gimmick to asyncore (a ._map attribute on asyncore.dispatcher instances) that breaks the delicate ZEO startup dance. Repaired that.
Merge rev 29052 from 3.3 branch.
Port from ZODB 3.2.
Fixed several thread and asyncore races in ZEO's connection dance.
ZEO/tests/ConnectionTests.py
The pollUp() and pollDown() methods were pure busy loops whenever
the asyncore socket map was empty, and at least on some flavors of
Linux that starved the other thread(s) trying to do real work.
This grossly increased the time needed to run tests using these, and
sometimes caused bogus "timed out" test failures.
ZEO/zrpc/client.py
ZEO/zrpc/connection.py
Renamed class ManagedConnection to ManagedClientConnection, for clarity.
Moved the comment block about protocol negotiation from the guts of
ManagedClientConnection to before the Connection base class -- the
Connection constructor can't be understood without this context. Added
more words about the delicate protocol negotiation dance.
Connection class: made this an abstract base clase. Derived classes
_must_ implement the handshake() method. There was really nothing in
common between server and client wrt what handshake() needs to do, and
it was confusing for one of them to use the base class handshake() while
the other replaced handshake() completely.
Connection.__init__: It isn't safe to register with asyncore's socket
map before special-casing for the first (protocol handshake) message is
set up. Repaired that. Also removed the pointless "optionalness" of
the optional arguments.
ManagedClientConnection.__init__: Added machinery to set up correct
(thread-safe) message queueing. There was an unrepairable hole before,
in the transition between "I'm queueing msgs waiting for the server
handshake" and "I'm done queueing messages": it was impossible to know
whether any calls to the client's "queue a message" method were in
progress (in other threads), so impossible to make the transition safely
in all cases. The client had to grow its own message_output() method,
with a mutex protecting the transition from thread races.
Changed zrpc-conn log messages to include "(S)" for server-side or
"(C)" for client-side. This is especially helpful for figuring out
logs produced while running the test suite (the server and client
log messages end up in the same file then).
Updated license to version 2.1.
Expand svn Id keywords in .py, .c, and .h files.
Set mime-type or svn-eol property from cvs expansion data
Converted zRPC to use 'logging' instead of zLOG. This probably broke the log analyzers... :(
Remove unused imports.
Fix bug that prevented ZEO from working with Python 2.4. Connection initialized _map as a dict containing a single entry mapping the connection's fileno to the connection. That was a misuse of the _map variable, which is also used by the asyncore.dispatcher base class to indicate whether the dispatcher users the default socket_map or a custom socket_map. A recent change to asyncore caused it to use _map in its add_channel() and del_channel() methods, which presumes to be a bug fix (may get ported to 2.3). That causes our dubious use of _map to be a problem, because we also put the Connections in the global socket_map. The new asyncore won't remove it from the global socket map, because it has a custom _map. Also change a bunch of 0/1s to False/Trues.
Merge changes from Zope-2_7-branch to the trunk.
Merge changes from ZODB3-3_2-branch to Zope-2_7-branch. Please make all future changes on the Zope-2_7-branch instead.
Bacport various cache consistency bug fixes from the ZODB3-3_1-branch.
Merge ZODB3-auth-branch and bump a few version numbers. After the merge, I made several Python 2.1 compatibility changes for the auth code.
Add flush method.
Be prepared for a call that returns an empty tuple.
Lower another log level ("recv reply: %s, %s, %s") at Florent
Guillaume's request.
Lower to log levels, one at Toby's request. Closes SF bug #659068.
Rewrite pending() to handle input and output. Pending does reads and writes. In the case of server startup, we may need to write out zeoVerify() messages. Always check for read status, but don't check for write status only there is output to do. Only continue in this loop as long as there is data to read.
Add _deferred_call() and _deferred_wait() for testing purposes. XXX The deferred name isn't perfect, but async is already taken.
Merge ZODB3-fast-restart-branch to the trunk
Remove binding for exception e. It's unused.
Merge ZODB 3.1 changes to the trunk. XXX Not sure if berkeley still works.
Use short_repr() instead of repr() a few more places.
In wait(), when there's no asyncore main loop, we called asyncore.poll() with a timeout of 10 seconds. Change this to a variable timeout starting at 1 msec and doubling until 1 second. While debugging Win2k crashes in the check4ExtStorageThread test from ZODB/tests/MTStorage.py, Tim noticed that there were frequent 10 second gaps in the log file where *nothing* happens. These were caused by the following scenario. Suppose a ZEO client process has two threads using the same connection to the ZEO server, and there's no asyncore loop active. T1 makes a synchronous call, and enters the wait() function. Then T2 makes another synchronous call, and enters the wait() function. At this point, both are blocked in the select() call in asyncore.poll(), with a timeout of 10 seconds (in the old version). Now the replies for both calls arrive. Say T1 wakes up. The handle_read() method in smac.py calls self.recv(8096), so it gets both replies in its buffer, decodes both, and calls self.message_input() for both, which sticks both replies in the self.replies dict. Now T1 finds its response, its wait() call returns with it. But T2 is still stuck in asyncore.poll(): its select() call never woke up, and has to "sit out" the whole timeout of 10 seconds. (Good thing I added timeouts to everything! Or perhaps not, since it masked the problem.) One other condition must be satisfied before this becomes a disaster: T2 must have started a transaction, and all other threads must be waiting to start another transaction. This is what I saw in the log. (Hmm, maybe a message should be logged when a thread is waiting to start a transaction this way.) In a real Zope application, this won't happen, because there's a centralized asyncore loop in a separate thread (probably the client's main thread) and the various threads would be waiting on the condition variable; whenever a reply is inserted in the replies dict, all threads are notified. But in the test suite there's no asyncore loop, and I don't feel like adding one. So the exponential backoff seems the easiest "solution".
Fix error handling logic for pickling errors. If an exception occurs while decoding a message, there is really nothing the server can do to recover. If the message was a synchronous call, the client will wait for ever for the reply. The server can't send the reply, because it couldn't unpickle the message id. Instead of trying to recover, just let the exception propogate up to asyncore where the connection will be closed. As a result, eliminate DecodingError and special case in handle_error() that handled flags == None.
send_reply(): catch errors in encode() and send a ZRPCError exception instead. return_error(): be more careful calling repr() on err_value.
Fix the control flow in pending(). Thanks to Ury Marshak!!! Rather than blaming window for reporting success as an error, the else clause on the second try block should be an except clause.
Various repairs and nits: - Change pending() to use select.select() instead of select.poll(), so it'll work on Windows. - Clarify comment to say that only Exceptions are propagated. - Change some private variables to public (everything else is public). - Remove XXX comment about logging at INFO level (we already do that now :-).
I set out making wait=1 work for fallback connections, i.e. the
ClientStorage constructor called with both wait=1 and
read_only_fallback=1 should return, indicating its readiness, when a
read-only connection was made. This is done by calling
connect(sync=1). Previously this waited for the ConnectThread to
finish, but that thread doesn't finish until it's made a read-write
connection, so a different mechanism is needed.
I ended up doing a major overhaul of the interfaces between
ClientStorage, ConnectionManager, ConnectThread/ConnectWrapper, and
even ManagedConnection. Changes:
ClientStorage.py:
ClientStorage:
- testConnection() now returns just the preferred flag; stubs are
cheap and I like to have the notifyConnected() signature be the
same for clients and servers.
- notifyConnected() now takes a connection (to match the signature
of this method in StorageServer), and creates a new stub. It also
takes care of the reconnect business if the client was already
connected, rather than the ClientManager. It stores the
connection as self._connection so it can close the previous one.
This is also reset by notifyDisconnected().
zrpc/client.py:
ConnectionManager:
- Changed self.thread_lock into a condition variable. It now also
protects self.connection. The condition is notified when
self.connection is set to a non-None value in connect_done();
connect(sync=1) waits for it. The self.connected variable is no
more; we test "self.connection is not None" instead.
- Tried to made close() reentrant. (There's a trick: you can't set
self.connection to None, conn.close() ends up calling close_conn()
which does this.)
- Renamed notify_closed() to close_conn(), for symmetry with the
StorageServer API.
- Added an is_connected() method so ConnectThread.try_connect()
doesn't have to dig inside the manager's guts to find out if the
manager is connected (important for the disposition of fallback
wrappers).
ConnectThread and ConnectWrapper:
- Follow above changes in the ClientStorage and ConnectionManager
APIs: don't close the manager's connection when reconnecting, but
leave that up to notifyConnected(); ConnectWrapper no longer
manages the stub.
- ConnectWrapper sets self.sock to None once it's created a
ManagedConnection -- from there on the connection is is charge of
closing the socket.
zrpc/connection.py:
ManagedServerConnection:
- Changed the order in which close() calls things; super_close()
should be last.
ManagedConnection:
- Ditto, and call the manager's close_conn() instead of
notify_closed().
tests/testZEO.py:
- In checkReconnectSwitch(), we can now open the client storage with
wait=1 and read_only_fallback=1.
The mystery of the Win98 hangs in the checkReconnectSwitch() test until I added an is_connected() test to testConnection() is solved. After the ConnectThread has switched the client to the new, read-write connection, it closes the read-only connection(s) that it was saving up in case there was no read-write connection. But closing a ManagedConnection calls notify_closed() on the manager, which disconnected the manager and the client from its brand new read-write connection. The mistake here is that this should only be done when closing the manager's current connection! The fix was to add an argument to notify_closed() that passes the connection object being closed; notify_closed() returns without doing a thing when that is not the current connection. I presume this didn't happen on Linux because there the sockets happened to connect in a different order, and there was no read-only connection to close yet (just a socket trying to connect). I'm taking out the previous "fix" to ClientStorage, because that only masked the problem in this relatively simple test case. The problem could still occur when both a read-only and a read-write server are up initially, and the read-only server connects first; once the read-write server connects, the read-write connection is installed, and then the saved read-only connection is closed which would again mistakenly disconnect the read-write connection. Another (related) fix is not to call self.mgr.notify_closed() but to call self.mgr.connection.close() when reconnecting. (Hmm, I wonder if it would make more sense to have an explicit reconnect callback to the manager and the client? Later.)
Define __str__ as an alias for __repr__. Otherwise __str__ will get the socket's __str__ due to a __getattr__ method in asyncore's dispatcher base class that everybody hates but nobody dares take away.
Remove the code from call() (and wait()) that serialized outgoing calls. If multiple threads sharing a ZEO connection want to make overlapping calls, they can do that now. This is mostly useful when one thread is waiting for a long-running pack() or undo*() call -- the other thread can now proceed. Jeremy & I did a review of the StorageServer code and found no place where overlapping incoming calls from the same connection could do any harm -- given that the only places where incoming calls can be handled are those places where the server makes a callback to the client.
Remove redundant class ServerConnection. Cleanup comments for Managed*Connection. Whitespace normalization.
When a call in the server raises an exception that is passed back to the client, don't log it at the ERROR level. If it really was a disaster, the client should log it. But if the client was expecting the exception, the esrver shouldn't get all upset about it. Change this to the INFO level. (When it *is* considered an error by the client, it's useful to be able to see the server-side traceback in the log.)
Remove unused argument in poll(). Fix comment in pending().
Move "send msg" and "recv msg" log calls to debug level.
This checkin contains changes (by me) that will allow multiple parallel outstanding calls. However it also contains code (by Jeremy, with one notifyAll() call added by me) that enforces the old rule of a single outstanding call. This is hopefully unnecessessary, but we haven't reviewed the server side yet to make sure that that's really the case (the server was until now getting serialized calls per connection).
Remove two unused __super_<method> definitions.
Subtle wording changes to the call and return log messages.
Major refactoring of the rpc locking mechanisms. Add a send_call() method that computes a new msgid and hands the message off to the smac layer. Uses __msgid_lock() call() still uses __call_lock, but callAsync() does not. callAsync() does not use any lock beyond what send_call() does.
Raise DisconnectedError consistently.
Oops. The type test is necessary. issubclass() can raise a TypeError.
Extend Delay objects with an error() method that send an exception to the client. Also, simplify the test for exceptions received by the client. If the class is a subclass of Exception, there's no need to ask if the instance is of type ClassType.
There's no need for notify_closed() to take a 'conn' argument.
Add pending() method to Connection.
Rename _do_async_loop() and _do_async_poll() to wait() and poll(). Repair comments in _call() about how wait() handles reply lock.
Move smac from ZEO to ZEO.zrpc.
Whitespace normalization.
It still bugged me that the manager method called by the connection when the connection is closed is called 'close()' when it doesn't close the manager, but tells the manager that one particular connection is closed. Rename it to 'close_conn()'.
Call notifyDisconnected() on the object handled by a Connection.
Expand some comments and docstrings.
Clear up comment inside _do_async_loop() and assert that lock is not held.
Import select to make previous checkin work.
Add a close() method to the ZEOStorage that closes connection to client. XXX If select() raises an exception inside asyncore, close the connection.
Add MTDelay() that blocks reply() on an event until send_reply() is ready. Add a poll call in send_reply() so that synchronous calls made when there is no other traffic don't cause long delays. (This probably only occurs during the pack tests.)
Make doubly sure that SystemExit isn't caught be generic error handling.
Use short_repr() for log at BLATHER level for method invocation. I think this is the log call that has been killing backends by sending very large strings.
merge change from ZEO2-branch; fix bug that caused exceptions log entries to be mangled
Merge ZEO2-branch to trunk. (Files added on branch.)
|
webmaster@zope.org Powered by ViewCVS 1.0-dev |
ViewCVS and CVS Help |